simplifying foreign key/RI checks

Started by Amit Langotealmost 5 years ago67 messages

amitlangote09@gmail.com

almost 5 years ago

2 attachment(s)

While discussing the topic of foreign key performance off-list with
Robert and Corey (also came up briefly on the list recently [1]/messages/by-id/CADkLM=cTt_8Fg1Jtij5j+QEBOxz9Cuu4DiMDYOwdtktDAKzuLw@mail.gmail.com, [2]/messages/by-id/1813.1586363881@antos),
a few ideas were thrown around to simplify our current system of RI
checks to enforce foreign keys with the aim of reducing some of its
overheads. The two main aspects of how we do these checks that
seemingly cause the most overhead are:

* Using row-level triggers that are fired during the modification of
the referencing and the referenced relations to perform them

* Using plain SQL queries issued over SPI

There is a discussion nearby titled "More efficient RI checks - take
2" [2]/messages/by-id/1813.1586363881@antos to address this problem from the viewpoint that it is using
row-level triggers that causes the most overhead, although there are
some posts mentioning that SQL-over-SPI is not without blame here. I
decided to focus on the latter aspect and tried reimplementing some
checks such that SPI can be skipped altogether.

I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation. The corresponding SQL is this:

SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x

$1 is the value of the foreign key of the new row. If the query
returns a row, all good. Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:

QUERY PLAN
--------------------------------------
LockRows
-> Index Scan using pk_pkey on pk x
Index Cond: (a = $1)
(3 rows)

So in most cases, the trigger's function would only execute the plan
that's already there, at least in a given session. That's good but
what we realized would be even better is if we didn't have to
"execute" a full-fledged "plan" for this, that is, to simply find out
whether a row containing the key we're looking for exists in the
referenced relation and if found lock it. Directly scanning the index
and locking it directly with table_tuple_lock() like ExecLockRows()
does gives us exactly that behavior, which seems simple enough to be
done in a not-so-long local function in ri_trigger.c. I gave that a
try and came up with the attached. It also takes care of the case
where the referenced relation is partitioned in which case its
appropriate leaf partition's index is scanned.

The patch results in ~2x improvement in the performance of inserts and
updates on referencing tables:

create table p (a numeric primary key);
insert into p select generate_series(1, 1000000);
create table f (a bigint references p);

-- unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6340.733 ms (00:06.341)

update f set a = a + 1;
UPDATE 1000000
Time: 7490.906 ms (00:07.491)

-- patched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3340.808 ms (00:03.341)

update f set a = a + 1;
UPDATE 1000000
Time: 4178.171 ms (00:04.178)

The improvement is even more dramatic when the referenced table (that
we're no longer querying over SPI) is partitioned. Here are the
numbers when the PK relation has 1000 hash partitions.

Unpatched:

insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 35898.783 ms (00:35.899)

update f set a = a + 1;
UPDATE 1000000
Time: 37736.294 ms (00:37.736)

Patched:

insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 5633.377 ms (00:05.633)

update f set a = a + 1;
UPDATE 1000000
Time: 6345.029 ms (00:06.345)

That's over ~5x improvement!

While the above case seemed straightforward enough for skipping SPI,
it seems a bit hard to do the same for other cases where we query the
*referencing* relation during an operation on the referenced table
(for example, checking if the row being deleted is still referenced),
because the plan in those cases is not predictably an index scan.
Also, the filters in those queries are more than likely to not match
the partition key of a partitioned referencing relation, so all
partitions will have to scanned. I have left those cases as future
work.

The patch seems simple enough to consider for inclusion in v14 unless
of course we stumble into some dealbreaker(s). I will add this to
March CF.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1]: /messages/by-id/CADkLM=cTt_8Fg1Jtij5j+QEBOxz9Cuu4DiMDYOwdtktDAKzuLw@mail.gmail.com

[2]: /messages/by-id/1813.1586363881@antos

Attachments:

v1-0001-Export-get_partition_for_tuple.patchapplication/octet-stream; name=v1-0001-Export-get_partition_for_tuple.patchDownload

From dbfffa276791cecda39c423006caa4a2bd7d4493 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v1 1/2] Export get_partition_for_tuple()

Currently, only execPartition.c can see it, although a subsequent
change will require it to be callable from another module.  To make
this possible, also change the interface to accept the partitioning
information using more widely available structs.
---
 src/backend/executor/execPartition.c | 14 +++++++-------
 src/include/executor/execPartition.h |  3 +++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 941731a0a9..84e50ee7c8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,6 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -330,7 +328,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1292,13 +1292,13 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * Return value is index of the partition (>= 0 and < partdesc->nparts) if one
  * found or -1 if none found.
  */
-static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+int
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..e5888d54f1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,8 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 
 #endif							/* EXECPARTITION_H */
-- 
2.24.1

v1-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v1-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From 91c7883944f03882ca9008f21fc2535b05566eb1 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 10 Dec 2020 20:21:29 +0900
Subject: [PATCH v1 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index
instead of the current way of issuing a
`SELECT 1 FROM referenced_relation ...` query through SPI.  This
saves a lot of work, especially when inserting into or updating a
referencing relation.
---
 src/backend/utils/adt/ri_triggers.c | 535 +++++++++++++++++++---------
 1 file changed, 357 insertions(+), 178 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6e3a41062f..6ed98f179a 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,7 +74,10 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
+/*
+ * 1 and 2  are no longer used, because PK (referenced) table is looked up
+ * directly using ri_PrimaryKeyExists().
+ */
 #define RI_PLAN_CHECK_LOOKUPPK			1
 #define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
 #define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
@@ -221,7 +230,327 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static Relation find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid);
+
+/*
+ * Checks whether a tuple containing the same unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_PrimaryKeyExists(Relation pk_rel, Relation fk_rel,
+					TupleTableSlot *slot,
+					const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
 
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			Oid		eq_opr = eq_oprs[i];
+			Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+			RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+			if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid))
+				pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+										   pk_vals[i],
+										   Int32GetDatum(-1), /* typmod */
+										   BoolGetDatum(false)); /* implicit coercion */
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/* Open the constraint index to be scanned. */
+	idxoid = get_constraint_index(constr_id);
+
+	/* Find the leaf partition if needed. */
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		leaf_pk_rel = find_leaf_pk_rel(pk_rel, riinfo,
+									   pk_vals, pk_nulls,
+									   idxoid, &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			return false;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+
+	idxrel = index_open(idxoid, RowShareLock);
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+
+	/* Set up ScanKeys for the index scan. */
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					BTEqualStrategyNumber,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	/*
+	 * Start the scan. To make the changes of the current command visible to
+	 * the scan and for subsequent locking of the tuple (if any) found,
+	 * increment the command counter.
+	 */
+	CommandCounterIncrement();
+	PushActiveSnapshot(GetTransactionSnapshot());
+	scan = index_beginscan(pk_rel, idxrel, GetActiveSnapshot(), num_pk, 0);
+	outslot = table_slot_create(pk_rel, NULL);
+
+	found = false;
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Try to find the tuple */
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+		found = true;
+
+	/* Found tuple, try to lock it in the lockmode. */
+	if (found)
+	{
+		TM_FailureData tmfd;
+		TM_Result	res;
+		int			lockflags;
+
+		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+		if (!IsolationUsesXactSnapshot())
+			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+		res = table_tuple_lock(pk_rel, &(outslot->tts_tid), GetActiveSnapshot(),
+							   outslot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock,
+							   lockflags,
+							   &tmfd);
+
+		switch (res)
+		{
+			case TM_Ok:
+				break;
+
+			case TM_Updated:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				found = false;
+				break;
+
+			case TM_Deleted:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				found = false;
+				break;
+
+			case TM_Invisible:
+				elog(ERROR, "attempted to lock invisible tuple");
+				break;
+
+			case TM_SelfModified:
+			case TM_BeingModified:
+			case TM_WouldBlock:
+				elog(ERROR, "unexpected table_tuple_lock status: %u", res);
+
+			default:
+				elog(ERROR, "unrecognized table_tuple_lock status: %u", res);
+		}
+	}
+
+	PopActiveSnapshot();
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	return found;
+}
+
+/*
+ * Finds the leaf partition of the partitioned relation 'root_pk_rel' that
+ * might contain the specified unique key.
+ *
+ * Returns NULL if no such leaf partition is found.
+ *
+ * This works because the unique key defined on the root relation always
+ * contains the partition key columns of all ancestors leading up to a
+ * given leaf partition.
+ */
+static Relation
+find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid)
+{
+	Relation	pk_rel = root_pk_rel;
+	const AttrNumber *pk_attnums = riinfo->pk_attnums;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(pk_rel);
+		PartitionDesc partdesc = RelationGetPartitionDesc(pk_rel);
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *mapped_partkey_attnums = partkey->partattrs;
+		int		i;
+		int		partidx;
+		Oid		partoid;
+
+		/*
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (pk_rel != root_pk_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_pk_rel),
+														RelationGetDescr(pk_rel));
+
+			if (map)
+			{
+				mapped_partkey_attnums = palloc(partkey->partnatts *
+												sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					mapped_partkey_attnums[i] = map->attnums[partattno - 1];
+				}
+			}
+		}
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0; i < partkey->partnatts; i++)
+		{
+			int		j;
+
+			for (j = 0; j < riinfo->nkeys; j++)
+			{
+				if (mapped_partkey_attnums[i] == pk_attnums[j])
+				{
+					partkey_vals[i] = pk_vals[j];
+					partkey_isnull[i] = pk_nulls[j] == 'n' ? true : false;
+					break;
+				}
+			}
+		}
+
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+
+		/* close any intermediate parents we opened */
+		if (pk_rel != root_pk_rel)
+			table_close(pk_rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		Assert(partidx < partdesc->nparts);
+		partoid = partdesc->oids[partidx];
+
+		pk_rel = table_open(partoid, RowShareLock);
+		constr_idxoid = index_get_partition(pk_rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		if (partdesc->is_leaf[partidx])
+		{
+			*leaf_idxoid = constr_idxoid;
+			return pk_rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
 
 /*
  * RI_FKey_check -
@@ -235,8 +564,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -316,9 +643,8 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_PrimaryKeyExists() to only include non-null columns.
 					 */
 					break;
 #endif
@@ -333,70 +659,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					false,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_PrimaryKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   RI_PLAN_CHECK_LOOKUPPK, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -451,81 +719,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_PrimaryKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -2181,9 +2378,9 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2193,33 +2390,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2296,9 +2466,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
@@ -2768,7 +2936,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2824,8 +2995,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can
+		 * be be cross-type (such as when called by ri_PrimaryKeyExists()),
+		 * in which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
-- 
2.24.1

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: Amit Langote (#1)

Re: simplifying foreign key/RI checks

Hi,
I was looking at this statement:

insert into f select generate_series(1, 2000000, 2);

Since certain generated values (the second half) are not in table p,
wouldn't insertion for those values fail ?
I tried a scaled down version (1000th) of your example:

yugabyte=# insert into f select generate_series(1, 2000, 2);
ERROR: insert or update on table "f" violates foreign key constraint
"f_a_fkey"
DETAIL: Key (a)=(1001) is not present in table "p".

For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :

+ * Collect partition key values from the unique key.

At the end of the nested loop, should there be an assertion
that partkey->partnatts partition key values have been found ?
This can be done by using a counter (initialized to 0) which is incremented
when a match is found by the inner loop.

Cheers

On Mon, Jan 18, 2021 at 4:40 AM Amit Langote <amitlangote09@gmail.com>
wrote:

Show quoted text

While discussing the topic of foreign key performance off-list with
Robert and Corey (also came up briefly on the list recently [1], [2]),
a few ideas were thrown around to simplify our current system of RI
checks to enforce foreign keys with the aim of reducing some of its
overheads. The two main aspects of how we do these checks that
seemingly cause the most overhead are:

* Using row-level triggers that are fired during the modification of
the referencing and the referenced relations to perform them

* Using plain SQL queries issued over SPI

There is a discussion nearby titled "More efficient RI checks - take
2" [2] to address this problem from the viewpoint that it is using
row-level triggers that causes the most overhead, although there are
some posts mentioning that SQL-over-SPI is not without blame here. I
decided to focus on the latter aspect and tried reimplementing some
checks such that SPI can be skipped altogether.

I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation. The corresponding SQL is this:

SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x

$1 is the value of the foreign key of the new row. If the query
returns a row, all good. Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:

QUERY PLAN
--------------------------------------
LockRows
-> Index Scan using pk_pkey on pk x
Index Cond: (a = $1)
(3 rows)

So in most cases, the trigger's function would only execute the plan
that's already there, at least in a given session. That's good but
what we realized would be even better is if we didn't have to
"execute" a full-fledged "plan" for this, that is, to simply find out
whether a row containing the key we're looking for exists in the
referenced relation and if found lock it. Directly scanning the index
and locking it directly with table_tuple_lock() like ExecLockRows()
does gives us exactly that behavior, which seems simple enough to be
done in a not-so-long local function in ri_trigger.c. I gave that a
try and came up with the attached. It also takes care of the case
where the referenced relation is partitioned in which case its
appropriate leaf partition's index is scanned.

The patch results in ~2x improvement in the performance of inserts and
updates on referencing tables:

create table p (a numeric primary key);
insert into p select generate_series(1, 1000000);
create table f (a bigint references p);

-- unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6340.733 ms (00:06.341)

update f set a = a + 1;
UPDATE 1000000
Time: 7490.906 ms (00:07.491)

-- patched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3340.808 ms (00:03.341)

update f set a = a + 1;
UPDATE 1000000
Time: 4178.171 ms (00:04.178)

The improvement is even more dramatic when the referenced table (that
we're no longer querying over SPI) is partitioned. Here are the
numbers when the PK relation has 1000 hash partitions.

Unpatched:

insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 35898.783 ms (00:35.899)

update f set a = a + 1;
UPDATE 1000000
Time: 37736.294 ms (00:37.736)

Patched:

insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 5633.377 ms (00:05.633)

update f set a = a + 1;
UPDATE 1000000
Time: 6345.029 ms (00:06.345)

That's over ~5x improvement!

While the above case seemed straightforward enough for skipping SPI,
it seems a bit hard to do the same for other cases where we query the
*referencing* relation during an operation on the referenced table
(for example, checking if the row being deleted is still referenced),
because the plan in those cases is not predictably an index scan.
Also, the filters in those queries are more than likely to not match
the partition key of a partitioned referencing relation, so all
partitions will have to scanned. I have left those cases as future
work.

The patch seems simple enough to consider for inclusion in v14 unless
of course we stumble into some dealbreaker(s). I will add this to
March CF.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1]
/messages/by-id/CADkLM=cTt_8Fg1Jtij5j+QEBOxz9Cuu4DiMDYOwdtktDAKzuLw@mail.gmail.com

[2] /messages/by-id/1813.1586363881@antos

Pavel Stehule

pavel.stehule@gmail.com

almost 5 years ago

In reply to: Amit Langote (#1)

Re: simplifying foreign key/RI checks

po 18. 1. 2021 v 13:40 odesílatel Amit Langote <amitlangote09@gmail.com>
napsal:

While discussing the topic of foreign key performance off-list with
Robert and Corey (also came up briefly on the list recently [1], [2]),
a few ideas were thrown around to simplify our current system of RI
checks to enforce foreign keys with the aim of reducing some of its
overheads. The two main aspects of how we do these checks that
seemingly cause the most overhead are:

* Using row-level triggers that are fired during the modification of
the referencing and the referenced relations to perform them

* Using plain SQL queries issued over SPI

There is a discussion nearby titled "More efficient RI checks - take
2" [2] to address this problem from the viewpoint that it is using
row-level triggers that causes the most overhead, although there are
some posts mentioning that SQL-over-SPI is not without blame here. I
decided to focus on the latter aspect and tried reimplementing some
checks such that SPI can be skipped altogether.

I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation. The corresponding SQL is this:

SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x

$1 is the value of the foreign key of the new row. If the query
returns a row, all good. Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:

QUERY PLAN
--------------------------------------
LockRows
-> Index Scan using pk_pkey on pk x
Index Cond: (a = $1)
(3 rows)

What is performance when the referenced table is small? - a lot of
codebooks are small between 1000 to 10K rows.

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Pavel Stehule (#3)

Re: simplifying foreign key/RI checks

On Tue, Jan 19, 2021 at 3:01 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:

po 18. 1. 2021 v 13:40 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:

I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation. The corresponding SQL is this:

SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x

$1 is the value of the foreign key of the new row. If the query
returns a row, all good. Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:

QUERY PLAN
--------------------------------------
LockRows
-> Index Scan using pk_pkey on pk x
Index Cond: (a = $1)
(3 rows)

What is performance when the referenced table is small? - a lot of codebooks are small between 1000 to 10K rows.

I see the same ~2x improvement.

create table p (a numeric primary key);
insert into p select generate_series(1, 1000);
create table f (a bigint references p);

Unpatched:

insert into f select i%1000+1 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 5461.377 ms (00:05.461)

Patched:

insert into f select i%1000+1 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 2357.440 ms (00:02.357)

That's expected because the overhead of using SPI to check the PK
table, which the patch gets rid of, is the same no matter the size of
the index to be scanned.

--
Amit Langote
EDB: http://www.enterprisedb.com

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Zhihong Yu (#2)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
I was looking at this statement:

insert into f select generate_series(1, 2000000, 2);

Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ?
I tried a scaled down version (1000th) of your example:

yugabyte=# insert into f select generate_series(1, 2000, 2);
ERROR: insert or update on table "f" violates foreign key constraint "f_a_fkey"
DETAIL: Key (a)=(1001) is not present in table "p".

Sorry, a wrong copy-paste by me. Try this:

create table p (a numeric primary key);
insert into p select generate_series(1, 2000000);
create table f (a bigint references p);

-- Unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6527.652 ms (00:06.528)

update f set a = a + 1;
UPDATE 1000000
Time: 8108.310 ms (00:08.108)

-- Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3312.193 ms (00:03.312)

update f set a = a + 1;
UPDATE 1000000
Time: 4292.807 ms (00:04.293)

For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :

+ * Collect partition key values from the unique key.

At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been found ?
This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop.

I've updated the patch to add the Assert. Thanks for taking a look.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v2-0001-Export-get_partition_for_tuple.patchapplication/octet-stream; name=v2-0001-Export-get_partition_for_tuple.patchDownload

From 1240f04a9796760d814c9902e3f5b90ef4a4868c Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v2 1/2] Export get_partition_for_tuple()

Currently, only execPartition.c can see it, although a subsequent
change will require it to be callable from another module.  To make
this possible, also change the interface to accept the partitioning
information using more widely available structs.
---
 src/backend/executor/execPartition.c | 14 +++++++-------
 src/include/executor/execPartition.h |  3 +++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 941731a0a9..84e50ee7c8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,6 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -330,7 +328,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1292,13 +1292,13 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * Return value is index of the partition (>= 0 and < partdesc->nparts) if one
  * found or -1 if none found.
  */
-static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+int
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..e5888d54f1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,8 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 
 #endif							/* EXECPARTITION_H */
-- 
2.24.1

v2-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v2-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From 224f36b30a287bd61d5ca4139ef0c3a9af93b21d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 10 Dec 2020 20:21:29 +0900
Subject: [PATCH v2 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index
instead of the current way of issuing a
`SELECT 1 FROM referenced_relation ...` query through SPI.  This
saves a lot of work, especially when inserting into or updating a
referencing relation.
---
 src/backend/utils/adt/ri_triggers.c | 537 +++++++++++++++++++---------
 1 file changed, 359 insertions(+), 178 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6e3a41062f..e40aec0f55 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,7 +74,10 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
+/*
+ * 1 and 2  are no longer used, because PK (referenced) table is looked up
+ * directly using ri_PrimaryKeyExists().
+ */
 #define RI_PLAN_CHECK_LOOKUPPK			1
 #define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
 #define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
@@ -221,7 +230,329 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static Relation find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid);
+
+/*
+ * Checks whether a tuple containing the same unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_PrimaryKeyExists(Relation pk_rel, Relation fk_rel,
+					TupleTableSlot *slot,
+					const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
 
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			Oid		eq_opr = eq_oprs[i];
+			Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+			RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+			if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid))
+				pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+										   pk_vals[i],
+										   Int32GetDatum(-1), /* typmod */
+										   BoolGetDatum(false)); /* implicit coercion */
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/* Open the constraint index to be scanned. */
+	idxoid = get_constraint_index(constr_id);
+
+	/* Find the leaf partition if needed. */
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		leaf_pk_rel = find_leaf_pk_rel(pk_rel, riinfo,
+									   pk_vals, pk_nulls,
+									   idxoid, &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			return false;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+
+	idxrel = index_open(idxoid, RowShareLock);
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+
+	/* Set up ScanKeys for the index scan. */
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					BTEqualStrategyNumber,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	/*
+	 * Start the scan. To make the changes of the current command visible to
+	 * the scan and for subsequent locking of the tuple (if any) found,
+	 * increment the command counter.
+	 */
+	CommandCounterIncrement();
+	PushActiveSnapshot(GetTransactionSnapshot());
+	scan = index_beginscan(pk_rel, idxrel, GetActiveSnapshot(), num_pk, 0);
+	outslot = table_slot_create(pk_rel, NULL);
+
+	found = false;
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Try to find the tuple */
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+		found = true;
+
+	/* Found tuple, try to lock it in the lockmode. */
+	if (found)
+	{
+		TM_FailureData tmfd;
+		TM_Result	res;
+		int			lockflags;
+
+		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+		if (!IsolationUsesXactSnapshot())
+			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+		res = table_tuple_lock(pk_rel, &(outslot->tts_tid), GetActiveSnapshot(),
+							   outslot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock,
+							   lockflags,
+							   &tmfd);
+
+		switch (res)
+		{
+			case TM_Ok:
+				break;
+
+			case TM_Updated:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				found = false;
+				break;
+
+			case TM_Deleted:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				found = false;
+				break;
+
+			case TM_Invisible:
+				elog(ERROR, "attempted to lock invisible tuple");
+				break;
+
+			case TM_SelfModified:
+			case TM_BeingModified:
+			case TM_WouldBlock:
+				elog(ERROR, "unexpected table_tuple_lock status: %u", res);
+
+			default:
+				elog(ERROR, "unrecognized table_tuple_lock status: %u", res);
+		}
+	}
+
+	PopActiveSnapshot();
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	return found;
+}
+
+/*
+ * Finds the leaf partition of the partitioned relation 'root_pk_rel' that
+ * might contain the specified unique key.
+ *
+ * Returns NULL if no such leaf partition is found.
+ *
+ * This works because the unique key defined on the root relation always
+ * contains the partition key columns of all ancestors leading up to a
+ * given leaf partition.
+ */
+static Relation
+find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid)
+{
+	Relation	pk_rel = root_pk_rel;
+	const AttrNumber *pk_attnums = riinfo->pk_attnums;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(pk_rel);
+		PartitionDesc partdesc = RelationGetPartitionDesc(pk_rel);
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *mapped_partkey_attnums = partkey->partattrs;
+		int		i;
+		int		partidx;
+		Oid		partoid;
+
+		/*
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (pk_rel != root_pk_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_pk_rel),
+														RelationGetDescr(pk_rel));
+
+			if (map)
+			{
+				mapped_partkey_attnums = palloc(partkey->partnatts *
+												sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					mapped_partkey_attnums[i] = map->attnums[partattno - 1];
+				}
+			}
+		}
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0; i < partkey->partnatts;)
+		{
+			int		j;
+
+			for (j = 0; j < riinfo->nkeys; j++)
+			{
+				if (mapped_partkey_attnums[i] == pk_attnums[j])
+				{
+					partkey_vals[i] = pk_vals[j];
+					partkey_isnull[i] = pk_nulls[j] == 'n' ? true : false;
+					i++;
+					break;
+				}
+			}
+		}
+		Assert(i == partkey->partnatts);
+
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+
+		/* close any intermediate parents we opened */
+		if (pk_rel != root_pk_rel)
+			table_close(pk_rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		Assert(partidx < partdesc->nparts);
+		partoid = partdesc->oids[partidx];
+
+		pk_rel = table_open(partoid, RowShareLock);
+		constr_idxoid = index_get_partition(pk_rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		if (partdesc->is_leaf[partidx])
+		{
+			*leaf_idxoid = constr_idxoid;
+			return pk_rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
 
 /*
  * RI_FKey_check -
@@ -235,8 +566,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -316,9 +645,8 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_PrimaryKeyExists() to only include non-null columns.
 					 */
 					break;
 #endif
@@ -333,70 +661,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					false,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_PrimaryKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   RI_PLAN_CHECK_LOOKUPPK, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -451,81 +721,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_PrimaryKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -2181,9 +2380,9 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2193,33 +2392,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2296,9 +2468,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
@@ -2768,7 +2938,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2824,8 +2997,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can
+		 * be be cross-type (such as when called by ri_PrimaryKeyExists()),
+		 * in which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
-- 
2.24.1

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: Amit Langote (#5)

Re: simplifying foreign key/RI checks

Thanks for the quick response.

+               if (mapped_partkey_attnums[i] == pk_attnums[j])
+               {
+                   partkey_vals[i] = pk_vals[j];
+                   partkey_isnull[i] = pk_nulls[j] == 'n' ? true : false;
+                   i++;
+                   break;

The way counter (i) is incremented is out of my expectation.
In the rare case, where some i doesn't have corresponding pk_attnums[j],
wouldn't there be a dead loop ?

I think the goal of adding the assertion should be not loop infinitely even
if the invariant is not satisfied.

I guess a counter other than i would be better for this purpose.

Cheers

On Mon, Jan 18, 2021 at 6:45 PM Amit Langote <amitlangote09@gmail.com>
wrote:

Show quoted text

On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
I was looking at this statement:

insert into f select generate_series(1, 2000000, 2);

Since certain generated values (the second half) are not in table p,

wouldn't insertion for those values fail ?

I tried a scaled down version (1000th) of your example:

yugabyte=# insert into f select generate_series(1, 2000, 2);
ERROR: insert or update on table "f" violates foreign key constraint

"f_a_fkey"

DETAIL: Key (a)=(1001) is not present in table "p".

Sorry, a wrong copy-paste by me. Try this:

create table p (a numeric primary key);
insert into p select generate_series(1, 2000000);
create table f (a bigint references p);

-- Unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6527.652 ms (00:06.528)

update f set a = a + 1;
UPDATE 1000000
Time: 8108.310 ms (00:08.108)

-- Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3312.193 ms (00:03.312)

update f set a = a + 1;
UPDATE 1000000
Time: 4292.807 ms (00:04.293)

For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :

+ * Collect partition key values from the unique key.

At the end of the nested loop, should there be an assertion that

partkey->partnatts partition key values have been found ?

This can be done by using a counter (initialized to 0) which is

incremented when a match is found by the inner loop.

I've updated the patch to add the Assert. Thanks for taking a look.

--
Amit Langote
EDB: http://www.enterprisedb.com

japin

japinli@hotmail.com

almost 5 years ago

In reply to: Amit Langote (#5)

Re: simplifying foreign key/RI checks

On Tue, 19 Jan 2021 at 10:45, Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
I was looking at this statement:

insert into f select generate_series(1, 2000000, 2);

Since certain generated values (the second half) are not in table p, wouldn't insertion for those values fail ?
I tried a scaled down version (1000th) of your example:

yugabyte=# insert into f select generate_series(1, 2000, 2);
ERROR: insert or update on table "f" violates foreign key constraint "f_a_fkey"
DETAIL: Key (a)=(1001) is not present in table "p".

Sorry, a wrong copy-paste by me. Try this:

create table p (a numeric primary key);
insert into p select generate_series(1, 2000000);
create table f (a bigint references p);

-- Unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6527.652 ms (00:06.528)

update f set a = a + 1;
UPDATE 1000000
Time: 8108.310 ms (00:08.108)

-- Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3312.193 ms (00:03.312)

update f set a = a + 1;
UPDATE 1000000
Time: 4292.807 ms (00:04.293)

For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :

+ * Collect partition key values from the unique key.

At the end of the nested loop, should there be an assertion that partkey->partnatts partition key values have been found ?
This can be done by using a counter (initialized to 0) which is incremented when a match is found by the inner loop.

I've updated the patch to add the Assert. Thanks for taking a look.

After apply the v2 patches, here are some warnings:

In file included from /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0,
from /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24:
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c: In function ‘ri_PrimaryKeyExists’:
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
do { \
^
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2: note: in expansion of macro ‘ereport_domain’
ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
^~~~~~~~~~~~~~
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2: note: in expansion of macro ‘ereport’
ereport(elevel, errmsg_internal(__VA_ARGS__))
^~~~~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5: note: in expansion of macro ‘elog’
elog(ERROR, "unexpected table_tuple_lock status: %u", res);
^~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4: note: here
default:
^~~~~~~

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Pavel Stehule

pavel.stehule@gmail.com

almost 5 years ago

In reply to: Amit Langote (#4)

Re: simplifying foreign key/RI checks

út 19. 1. 2021 v 3:08 odesílatel Amit Langote <amitlangote09@gmail.com>
napsal:

On Tue, Jan 19, 2021 at 3:01 AM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

po 18. 1. 2021 v 13:40 odesílatel Amit Langote <amitlangote09@gmail.com>

napsal:

I started with the check that's performed when inserting into or
updating the referencing table to confirm that the new row points to a
valid row in the referenced relation. The corresponding SQL is this:

SELECT 1 FROM pk_rel x WHERE x.pkey = $1 FOR KEY SHARE OF x

$1 is the value of the foreign key of the new row. If the query
returns a row, all good. Thanks to SPI, or its use of plan caching,
the query is re-planned only a handful of times before making a
generic plan that is then saved and reused, which looks like this:

QUERY PLAN
--------------------------------------
LockRows
-> Index Scan using pk_pkey on pk x
Index Cond: (a = $1)
(3 rows)

What is performance when the referenced table is small? - a lot of

codebooks are small between 1000 to 10K rows.

I see the same ~2x improvement.

create table p (a numeric primary key);
insert into p select generate_series(1, 1000);
create table f (a bigint references p);

Unpatched:

insert into f select i%1000+1 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 5461.377 ms (00:05.461)

Patched:

insert into f select i%1000+1 from generate_series(1, 1000000) i;
INSERT 0 1000000
Time: 2357.440 ms (00:02.357)

That's expected because the overhead of using SPI to check the PK
table, which the patch gets rid of, is the same no matter the size of
the index to be scanned.

It looks very well.

Regards

Pavel

Show quoted text

--
Amit Langote
EDB: http://www.enterprisedb.com

Corey Huinker

corey.huinker@gmail.com

almost 5 years ago

In reply to: japin (#7)

Re: simplifying foreign key/RI checks

In file included from
/home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0,
from
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24:
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:
In function ‘ri_PrimaryKeyExists’:
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5:
warning: this statement may fall through [-Wimplicit-fallthrough=]
do { \
^
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2:
note: in expansion of macro ‘ereport_domain’
ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
^~~~~~~~~~~~~~
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2:
note: in expansion of macro ‘ereport’
ereport(elevel, errmsg_internal(__VA_ARGS__))
^~~~~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5:
note: in expansion of macro ‘elog’
elog(ERROR, "unexpected table_tuple_lock status: %u", res);
^~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4:
note: here
default:
^~~~~~~

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

I also get this warning. Adding a "break;" at line 418 resolves the warning.

#10

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Corey Huinker (#9)

Re: simplifying foreign key/RI checks

On Tue, Jan 19, 2021 at 2:56 PM Corey Huinker <corey.huinker@gmail.com> wrote:

In file included from /home/japin/Codes/postgresql/Debug/../src/include/postgres.h:47:0,
from /home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:24:
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c: In function ‘ri_PrimaryKeyExists’:
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:134:5: warning: this statement may fall through [-Wimplicit-fallthrough=]
do { \
^
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:156:2: note: in expansion of macro ‘ereport_domain’
ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
^~~~~~~~~~~~~~
/home/japin/Codes/postgresql/Debug/../src/include/utils/elog.h:229:2: note: in expansion of macro ‘ereport’
ereport(elevel, errmsg_internal(__VA_ARGS__))
^~~~~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:417:5: note: in expansion of macro ‘elog’
elog(ERROR, "unexpected table_tuple_lock status: %u", res);
^~~~
/home/japin/Codes/postgresql/Debug/../src/backend/utils/adt/ri_triggers.c:419:4: note: here
default:
^~~~~~~

Thanks, will fix it.

--
Amit Langote
EDB: http://www.enterprisedb.com

#11

Corey Huinker

corey.huinker@gmail.com

almost 5 years ago

In reply to: Amit Langote (#5)

Re: simplifying foreign key/RI checks

On Mon, Jan 18, 2021 at 9:45 PM Amit Langote <amitlangote09@gmail.com>
wrote:

On Tue, Jan 19, 2021 at 2:47 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
I was looking at this statement:

insert into f select generate_series(1, 2000000, 2);

Since certain generated values (the second half) are not in table p,

wouldn't insertion for those values fail ?

I tried a scaled down version (1000th) of your example:

yugabyte=# insert into f select generate_series(1, 2000, 2);
ERROR: insert or update on table "f" violates foreign key constraint

"f_a_fkey"

DETAIL: Key (a)=(1001) is not present in table "p".

Sorry, a wrong copy-paste by me. Try this:

create table p (a numeric primary key);
insert into p select generate_series(1, 2000000);
create table f (a bigint references p);

-- Unpatched
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 6527.652 ms (00:06.528)

update f set a = a + 1;
UPDATE 1000000
Time: 8108.310 ms (00:08.108)

-- Patched:
insert into f select generate_series(1, 2000000, 2);
INSERT 0 1000000
Time: 3312.193 ms (00:03.312)

update f set a = a + 1;
UPDATE 1000000
Time: 4292.807 ms (00:04.293)

For v1-0002-Avoid-using-SPI-for-some-RI-checks.patch :

+ * Collect partition key values from the unique key.

At the end of the nested loop, should there be an assertion that

partkey->partnatts partition key values have been found ?

This can be done by using a counter (initialized to 0) which is

incremented when a match is found by the inner loop.

I've updated the patch to add the Assert. Thanks for taking a look.

--
Amit Langote
EDB: http://www.enterprisedb.com

v2 patch applies and passes make check and make check-world. Perhaps, given
the missing break at line 418 without any tests failing, we could add
another regression test if we're into 100% code path coverage. As it is, I
think the compiler warning was a sufficient alert.

The code is easy to read, and the comments touch on the major points of
what complexities arise from partitioned tables.

A somewhat pedantic complaint I have brought up off-list is that this patch
continues the pattern of the variable and function names making the
assumption that the foreign key is referencing the primary key of the
referenced table. Foreign key constraints need only reference a unique
index, it doesn't have to be the primary key. Granted, that unique index is
behaving exactly as a primary key would, so conceptually it is very
similar, but keeping with the existing naming (pk_rel, pk_type, etc) can
lead a developer to think that it would be just as correct to find the
referenced relation and get the primary key index from there, which would
not always be correct. This patch correctly grabs the index from the
constraint itself, so no problem there.

I like that this patch changes the absolute minimum of the code in order to
get a very significant performance benefit. It does so in a way that should
reduce resource pressure found in other places [1]/messages/by-id/CAKkQ508Z6r5e3jdqhfPWSzSajLpHo3OYYOAmfeSAuPTo5VGfgw@mail.gmail.com. This will in turn
reduce the performance penalty of "doing the right thing" in terms of
defining enforced foreign keys. It seems to get a clearer performance boost
than was achieved with previous efforts at statement level triggers.

This patch completely sidesteps the DELETE case, which has more insidious
performance implications, but is also far less common, and whose solution
will likely be very different.

[1]: /messages/by-id/CAKkQ508Z6r5e3jdqhfPWSzSajLpHo3OYYOAmfeSAuPTo5VGfgw@mail.gmail.com
/messages/by-id/CAKkQ508Z6r5e3jdqhfPWSzSajLpHo3OYYOAmfeSAuPTo5VGfgw@mail.gmail.com

#12

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Corey Huinker (#11)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Tue, Jan 19, 2021 at 3:46 PM Corey Huinker <corey.huinker@gmail.com> wrote:

v2 patch applies and passes make check and make check-world. Perhaps, given the missing break at line 418 without any tests failing, we could add another regression test if we're into 100% code path coverage. As it is, I think the compiler warning was a sufficient alert.

Thanks for the review. I will look into checking the coverage.

The code is easy to read, and the comments touch on the major points of what complexities arise from partitioned tables.

A somewhat pedantic complaint I have brought up off-list is that this patch continues the pattern of the variable and function names making the assumption that the foreign key is referencing the primary key of the referenced table. Foreign key constraints need only reference a unique index, it doesn't have to be the primary key. Granted, that unique index is behaving exactly as a primary key would, so conceptually it is very similar, but keeping with the existing naming (pk_rel, pk_type, etc) can lead a developer to think that it would be just as correct to find the referenced relation and get the primary key index from there, which would not always be correct. This patch correctly grabs the index from the constraint itself, so no problem there.

I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file. Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.

I like that this patch changes the absolute minimum of the code in order to get a very significant performance benefit. It does so in a way that should reduce resource pressure found in other places [1]. This will in turn reduce the performance penalty of "doing the right thing" in terms of defining enforced foreign keys. It seems to get a clearer performance boost than was achieved with previous efforts at statement level triggers.

[1] /messages/by-id/CAKkQ508Z6r5e3jdqhfPWSzSajLpHo3OYYOAmfeSAuPTo5VGfgw@mail.gmail.com

Thanks. I hadn't noticed [1] before today, but after looking it over,
it seems that what is being proposed there can still be of use. As
long as SPI is used in ri_trigger.c, it makes sense to consider any
tweaks addressing its negative impact, especially if they are not very
invasive. There's this patch too from the last month:
https://commitfest.postgresql.org/32/2930/

This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also far less common, and whose solution will likely be very different.

Yeah, we should continue looking into the ways to make referenced-side
RI checks be less bloated.

I've attached the updated patch.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v3-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v3-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From cf6955d6fc7e53d5790e1f4a12f273e2549359ad Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 10 Dec 2020 20:21:29 +0900
Subject: [PATCH v3 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index
instead of the current way of issuing a
`SELECT 1 FROM referenced_relation ...` query through SPI.  This
saves a lot of work, especially when inserting into or updating a
referencing relation.
---
 src/backend/utils/adt/ri_triggers.c | 541 +++++++++++++++++++---------
 1 file changed, 363 insertions(+), 178 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6e3a41062f..8171a14c96 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,7 +74,10 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
+/*
+ * 1 and 2  are no longer used, because PK (referenced) table is looked up
+ * directly using ri_ReferencedKeyExists().
+ */
 #define RI_PLAN_CHECK_LOOKUPPK			1
 #define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
 #define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
@@ -221,7 +230,332 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static Relation find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid);
+
+/*
+ * Checks whether a tuple containing the same unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			Oid		eq_opr = eq_oprs[i];
+			Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+			RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+			if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid))
+				pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+										   pk_vals[i],
+										   Int32GetDatum(-1), /* typmod */
+										   BoolGetDatum(false)); /* implicit coercion */
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/* Open the constraint index to be scanned. */
+	idxoid = get_constraint_index(constr_id);
+
+	/* Find the leaf partition if needed. */
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		leaf_pk_rel = find_leaf_pk_rel(pk_rel, riinfo,
+									   pk_vals, pk_nulls,
+									   idxoid, &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			return false;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+
+	idxrel = index_open(idxoid, RowShareLock);
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+
+	/* Set up ScanKeys for the index scan. */
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					BTEqualStrategyNumber,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	/*
+	 * Start the scan. To make the changes of the current command visible to
+	 * the scan and for subsequent locking of the tuple (if any) found,
+	 * increment the command counter.
+	 */
+	CommandCounterIncrement();
+	PushActiveSnapshot(GetTransactionSnapshot());
+	scan = index_beginscan(pk_rel, idxrel, GetActiveSnapshot(), num_pk, 0);
+	outslot = table_slot_create(pk_rel, NULL);
+
+	found = false;
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Try to find the tuple */
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+		found = true;
+
+	/* Found tuple, try to lock it in the lockmode. */
+	if (found)
+	{
+		TM_FailureData tmfd;
+		TM_Result	res;
+		int			lockflags;
+
+		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+		if (!IsolationUsesXactSnapshot())
+			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+		res = table_tuple_lock(pk_rel, &(outslot->tts_tid), GetActiveSnapshot(),
+							   outslot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock,
+							   lockflags,
+							   &tmfd);
+
+		switch (res)
+		{
+			case TM_Ok:
+				break;
+
+			case TM_Updated:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				found = false;
+				break;
+
+			case TM_Deleted:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				found = false;
+				break;
+
+			case TM_Invisible:
+				elog(ERROR, "attempted to lock invisible tuple");
+				break;
+
+			case TM_SelfModified:
+			case TM_BeingModified:
+			case TM_WouldBlock:
+				elog(ERROR, "unexpected table_tuple_lock status: %u", res);
+				break;
+
+			default:
+				elog(ERROR, "unrecognized table_tuple_lock status: %u", res);
+		}
+	}
 
+	PopActiveSnapshot();
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	return found;
+}
+
+/*
+ * Finds the leaf partition of the partitioned relation 'root_pk_rel' that
+ * might contain the specified unique key.
+ *
+ * Returns NULL if no such leaf partition is found.
+ *
+ * This works because the unique key defined on the root relation always
+ * contains the partition key columns of all ancestors leading up to a
+ * given leaf partition.
+ */
+static Relation
+find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid)
+{
+	Relation	pk_rel = root_pk_rel;
+	const AttrNumber *pk_attnums = riinfo->pk_attnums;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(pk_rel);
+		PartitionDesc partdesc = RelationGetPartitionDesc(pk_rel);
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *mapped_partkey_attnums = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (pk_rel != root_pk_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_pk_rel),
+														RelationGetDescr(pk_rel));
+
+			if (map)
+			{
+				mapped_partkey_attnums = palloc(partkey->partnatts *
+												sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					mapped_partkey_attnums[i] = map->attnums[partattno - 1];
+				}
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < riinfo->nkeys; k++)
+			{
+				if (mapped_partkey_attnums[i] == pk_attnums[k])
+				{
+					partkey_vals[j] = pk_vals[k];
+					partkey_isnull[j] = (pk_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+
+		/* close any intermediate parents we opened */
+		if (pk_rel != root_pk_rel)
+			table_close(pk_rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		Assert(partidx < partdesc->nparts);
+		partoid = partdesc->oids[partidx];
+
+		pk_rel = table_open(partoid, RowShareLock);
+		constr_idxoid = index_get_partition(pk_rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		if (partdesc->is_leaf[partidx])
+		{
+			*leaf_idxoid = constr_idxoid;
+			return pk_rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
 
 /*
  * RI_FKey_check -
@@ -235,8 +569,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -316,9 +648,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -333,70 +665,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					false,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   RI_PLAN_CHECK_LOOKUPPK, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -451,81 +725,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -2181,9 +2384,9 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2193,33 +2396,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2296,9 +2472,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
@@ -2768,7 +2942,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2824,8 +3001,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
-- 
2.24.1

v3-0001-Export-get_partition_for_tuple.patchapplication/octet-stream; name=v3-0001-Export-get_partition_for_tuple.patchDownload

From 1240f04a9796760d814c9902e3f5b90ef4a4868c Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v3 1/2] Export get_partition_for_tuple()

Currently, only execPartition.c can see it, although a subsequent
change will require it to be callable from another module.  To make
this possible, also change the interface to accept the partitioning
information using more widely available structs.
---
 src/backend/executor/execPartition.c | 14 +++++++-------
 src/include/executor/execPartition.h |  3 +++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 941731a0a9..84e50ee7c8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,6 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -330,7 +328,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1292,13 +1292,13 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * Return value is index of the partition (>= 0 and < partdesc->nparts) if one
  * found or -1 if none found.
  */
-static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+int
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..e5888d54f1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,8 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 
 #endif							/* EXECPARTITION_H */
-- 
2.24.1

#13

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Zhihong Yu (#6)

Re: simplifying foreign key/RI checks

On Tue, Jan 19, 2021 at 12:00 PM Zhihong Yu <zyu@yugabyte.com> wrote:

+               if (mapped_partkey_attnums[i] == pk_attnums[j])
+               {
+                   partkey_vals[i] = pk_vals[j];
+                   partkey_isnull[i] = pk_nulls[j] == 'n' ? true : false;
+                   i++;
+                   break;
The way counter (i) is incremented is out of my expectation.
In the rare case, where some i doesn't have corresponding pk_attnums[j], wouldn't there be a dead loop ?

I think the goal of adding the assertion should be not loop infinitely even if the invariant is not satisfied.

I guess a counter other than i would be better for this purpose.

I have done that in v3. Thanks.

--
Amit Langote
EDB: http://www.enterprisedb.com

#14

Corey Huinker

corey.huinker@gmail.com

almost 5 years ago

In reply to: Amit Langote (#12)

Re: simplifying foreign key/RI checks

I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file. Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.

I agree with leaving the existing terminology where it is for this patch.
Changing the function name is probably enough to alert the reader that the
things that are called pks may not be precisely that.

#15

Corey Huinker

corey.huinker@gmail.com

almost 5 years ago

In reply to: Amit Langote (#12)

Re: simplifying foreign key/RI checks

I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file. Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.

I think that's a nice compromise, it makes the reader aware of the concept.

I've attached the updated patch.

Missing "break" added. Check.
Comment updated. Check.
Function renamed. Check.
Attribute mapping matching test (and assertion) added. Check.
Patch applies to an as-of-today master, passes make check and check world.
No additional regression tests required, as no new functionality is
introduced.
No docs required, as there is nothing user-facing.

Questions:
1. There's a palloc for mapped_partkey_attnums, which is never freed, is
the prevailing memory context short lived enough that we don't care?
2. Same question for the AtrrMap map, should there be a free_attrmap().

#16

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Corey Huinker (#15)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Fri, Jan 22, 2021 at 3:22 PM Corey Huinker <corey.huinker@gmail.com> wrote:

I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file. Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.

I think that's a nice compromise, it makes the reader aware of the concept.

I've attached the updated patch.

Missing "break" added. Check.
Comment updated. Check.
Function renamed. Check.
Attribute mapping matching test (and assertion) added. Check.
Patch applies to an as-of-today master, passes make check and check world.
No additional regression tests required, as no new functionality is introduced.
No docs required, as there is nothing user-facing.

Thanks for the review.

Questions:
1. There's a palloc for mapped_partkey_attnums, which is never freed, is the prevailing memory context short lived enough that we don't care?
2. Same question for the AtrrMap map, should there be a free_attrmap().

I hadn't checked, but yes, the prevailing context is
AfterTriggerTupleContext, a short-lived one that is reset for every
trigger event tuple. I'm still inclined to explicitly free those
objects, so changed like that. While at it, I also changed
mapped_partkey_attnums to root_partattrs for readability.

Attached v4.
--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v4-0001-Export-get_partition_for_tuple.patchapplication/octet-stream; name=v4-0001-Export-get_partition_for_tuple.patchDownload

From dffb4e6a3b17d8ceb3351c58f65478861f36b349 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v4 1/2] Export get_partition_for_tuple()

Currently, only execPartition.c can see it, although a subsequent
change will require it to be callable from another module.  To make
this possible, also change the interface to accept the partitioning
information using more widely available structs.
---
 src/backend/executor/execPartition.c | 14 +++++++-------
 src/include/executor/execPartition.h |  3 +++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 1746cb8793..748a44f250 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,6 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -330,7 +328,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1309,13 +1309,13 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * Return value is index of the partition (>= 0 and < partdesc->nparts) if one
  * found or -1 if none found.
  */
-static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+int
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..e5888d54f1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,8 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 
 #endif							/* EXECPARTITION_H */
-- 
2.24.1

v4-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v4-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From 7ea5a35bf8ff9878a925f7a2a44fe8f9802ce9ae Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 10 Dec 2020 20:21:29 +0900
Subject: [PATCH v4 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index
instead of the current way of issuing a
`SELECT 1 FROM referenced_relation ...` query through SPI.  This
saves a lot of work, especially when inserting into or updating a
referencing relation.
---
 src/backend/utils/adt/ri_triggers.c | 546 +++++++++++++++++++---------
 1 file changed, 368 insertions(+), 178 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6e3a41062f..e6da6bb326 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,7 +74,10 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
+/*
+ * 1 and 2  are no longer used, because PK (referenced) table is looked up
+ * directly using ri_ReferencedKeyExists().
+ */
 #define RI_PLAN_CHECK_LOOKUPPK			1
 #define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
 #define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
@@ -221,7 +230,337 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static Relation find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid);
+
+/*
+ * Checks whether a tuple containing the same unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			Oid		eq_opr = eq_oprs[i];
+			Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+			RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+			if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid))
+				pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+										   pk_vals[i],
+										   Int32GetDatum(-1), /* typmod */
+										   BoolGetDatum(false)); /* implicit coercion */
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/* Open the constraint index to be scanned. */
+	idxoid = get_constraint_index(constr_id);
+
+	/* Find the leaf partition if needed. */
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		leaf_pk_rel = find_leaf_pk_rel(pk_rel, riinfo,
+									   pk_vals, pk_nulls,
+									   idxoid, &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			return false;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+
+	idxrel = index_open(idxoid, RowShareLock);
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+
+	/* Set up ScanKeys for the index scan. */
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					BTEqualStrategyNumber,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	/*
+	 * Start the scan. To make the changes of the current command visible to
+	 * the scan and for subsequent locking of the tuple (if any) found,
+	 * increment the command counter.
+	 */
+	CommandCounterIncrement();
+	PushActiveSnapshot(GetTransactionSnapshot());
+	scan = index_beginscan(pk_rel, idxrel, GetActiveSnapshot(), num_pk, 0);
+	outslot = table_slot_create(pk_rel, NULL);
+
+	found = false;
+	index_rescan(scan, skey, num_pk, NULL, 0);
 
+	/* Try to find the tuple */
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+		found = true;
+
+	/* Found tuple, try to lock it in the lockmode. */
+	if (found)
+	{
+		TM_FailureData tmfd;
+		TM_Result	res;
+		int			lockflags;
+
+		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+		if (!IsolationUsesXactSnapshot())
+			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+		res = table_tuple_lock(pk_rel, &(outslot->tts_tid), GetActiveSnapshot(),
+							   outslot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock,
+							   lockflags,
+							   &tmfd);
+
+		switch (res)
+		{
+			case TM_Ok:
+				break;
+
+			case TM_Updated:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				found = false;
+				break;
+
+			case TM_Deleted:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				found = false;
+				break;
+
+			case TM_Invisible:
+				elog(ERROR, "attempted to lock invisible tuple");
+				break;
+
+			case TM_SelfModified:
+			case TM_BeingModified:
+			case TM_WouldBlock:
+				elog(ERROR, "unexpected table_tuple_lock status: %u", res);
+				break;
+
+			default:
+				elog(ERROR, "unrecognized table_tuple_lock status: %u", res);
+		}
+	}
+
+	PopActiveSnapshot();
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	return found;
+}
+
+/*
+ * Finds the leaf partition of the partitioned relation 'root_pk_rel' that
+ * might contain the specified unique key.
+ *
+ * Returns NULL if no such leaf partition is found.
+ *
+ * This works because the unique key defined on the root relation always
+ * contains the partition key columns of all ancestors leading up to a
+ * given leaf partition.
+ */
+static Relation
+find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid)
+{
+	Relation	pk_rel = root_pk_rel;
+	const AttrNumber *pk_attnums = riinfo->pk_attnums;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(pk_rel);
+		PartitionDesc partdesc = RelationGetPartitionDesc(pk_rel);
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (pk_rel != root_pk_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_pk_rel),
+														RelationGetDescr(pk_rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < riinfo->nkeys; k++)
+			{
+				if (root_partattrs[i] == pk_attnums[k])
+				{
+					partkey_vals[j] = pk_vals[k];
+					partkey_isnull[j] = (pk_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+
+		/* close any intermediate parents we opened */
+		if (pk_rel != root_pk_rel)
+			table_close(pk_rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		Assert(partidx < partdesc->nparts);
+		partoid = partdesc->oids[partidx];
+
+		pk_rel = table_open(partoid, RowShareLock);
+		constr_idxoid = index_get_partition(pk_rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		if (partdesc->is_leaf[partidx])
+		{
+			*leaf_idxoid = constr_idxoid;
+			return pk_rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
 
 /*
  * RI_FKey_check -
@@ -235,8 +574,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -316,9 +653,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -333,70 +670,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					false,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   RI_PLAN_CHECK_LOOKUPPK, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -451,81 +730,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -2181,9 +2389,9 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2193,33 +2401,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2296,9 +2477,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
@@ -2768,7 +2947,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2824,8 +3006,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
-- 
2.24.1

#17

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: Amit Langote (#16)

Re: simplifying foreign key/RI checks

Hi,

+       for (i = 0; i < riinfo->nkeys; i++)
+       {
+           Oid     eq_opr = eq_oprs[i];
+           Oid     typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+           RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+           if (pk_nulls[i] != 'n' &&
OidIsValid(entry->cast_func_finfo.fn_oid))

It seems the pk_nulls[i] != 'n' check can be lifted ahead of the assignment
to the three local variables. That way, ri_HashCompareOp wouldn't be called
when pk_nulls[i] == 'n'.

+           case TM_Updated:
+               if (IsolationUsesXactSnapshot())
...
+           case TM_Deleted:
+               if (IsolationUsesXactSnapshot())

It seems the handling for TM_Updated and TM_Deleted is the same. The cases
for these two values can be put next to each other (saving one block of
code).

Cheers

On Fri, Jan 22, 2021 at 11:10 PM Amit Langote <amitlangote09@gmail.com>
wrote:

Show quoted text

On Fri, Jan 22, 2021 at 3:22 PM Corey Huinker <corey.huinker@gmail.com>
wrote:

I decided not to deviate from pk_ terminology so that the new code
doesn't look too different from the other code in the file. Although,
I guess we can at least call the main function
ri_ReferencedKeyExists() instead of ri_PrimaryKeyExists(), so I've
changed that.

I think that's a nice compromise, it makes the reader aware of the

concept.

I've attached the updated patch.

Missing "break" added. Check.
Comment updated. Check.
Function renamed. Check.
Attribute mapping matching test (and assertion) added. Check.
Patch applies to an as-of-today master, passes make check and check

world.

No additional regression tests required, as no new functionality is

introduced.

No docs required, as there is nothing user-facing.

Thanks for the review.

Questions:
1. There's a palloc for mapped_partkey_attnums, which is never freed, is

the prevailing memory context short lived enough that we don't care?

2. Same question for the AtrrMap map, should there be a free_attrmap().

I hadn't checked, but yes, the prevailing context is
AfterTriggerTupleContext, a short-lived one that is reset for every
trigger event tuple. I'm still inclined to explicitly free those
objects, so changed like that. While at it, I also changed
mapped_partkey_attnums to root_partattrs for readability.

Attached v4.
--
Amit Langote
EDB: http://www.enterprisedb.com

#18

Corey Huinker

corey.huinker@gmail.com

almost 5 years ago

In reply to: Zhihong Yu (#17)

Re: simplifying foreign key/RI checks

On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
+       for (i = 0; i < riinfo->nkeys; i++)
+       {
+           Oid     eq_opr = eq_oprs[i];
+           Oid     typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+           RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+           if (pk_nulls[i] != 'n' &&
OidIsValid(entry->cast_func_finfo.fn_oid))
It seems the pk_nulls[i] != 'n' check can be lifted ahead of the
assignment to the three local variables. That way, ri_HashCompareOp
wouldn't be called when pk_nulls[i] == 'n'.
+           case TM_Updated:
+               if (IsolationUsesXactSnapshot())
...
+           case TM_Deleted:
+               if (IsolationUsesXactSnapshot())
It seems the handling for TM_Updated and TM_Deleted is the same. The cases
for these two values can be put next to each other (saving one block of
code).

Cheers

I'll pause on reviewing v4 until you've addressed the suggestions above.

#19

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Corey Huinker (#18)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Sun, Jan 24, 2021 at 11:26 AM Corey Huinker <corey.huinker@gmail.com> wrote:

On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,

Thanks for the review.

+       for (i = 0; i < riinfo->nkeys; i++)
+       {
+           Oid     eq_opr = eq_oprs[i];
+           Oid     typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+           RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+           if (pk_nulls[i] != 'n' && OidIsValid(entry->cast_func_finfo.fn_oid))
It seems the pk_nulls[i] != 'n' check can be lifted ahead of the assignment to the three local variables. That way, ri_HashCompareOp wouldn't be called when pk_nulls[i] == 'n'.

Good idea, so done. Although, there can't be nulls right now.

+           case TM_Updated:
+               if (IsolationUsesXactSnapshot())
...
+           case TM_Deleted:
+               if (IsolationUsesXactSnapshot())
It seems the handling for TM_Updated and TM_Deleted is the same. The cases for these two values can be put next to each other (saving one block of code).

Ah, yes. The TM_Updated case used to be handled a bit differently in
earlier unposted versions of the patch, though at some point I
concluded that the special handling was unnecessary, but didn't
realize what you just pointed out. Fixed.

I'll pause on reviewing v4 until you've addressed the suggestions above.

Here's v5.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v5-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v5-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From a86b8514f905ae2f07b4f83f2568e8306f91ad4b Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 10 Dec 2020 20:21:29 +0900
Subject: [PATCH v5 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.
---
 src/backend/utils/adt/ri_triggers.c | 542 +++++++++++++++++++---------
 1 file changed, 364 insertions(+), 178 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6e3a41062f..f9323d09d2 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,7 +74,10 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
+/*
+ * 1 and 2  are no longer used, because PK (referenced) table is looked up
+ * directly using ri_ReferencedKeyExists().
+ */
 #define RI_PLAN_CHECK_LOOKUPPK			1
 #define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
 #define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
@@ -221,7 +230,333 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static Relation find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid);
+
+/*
+ * Checks whether a tuple containing the same unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/* Open the constraint index to be scanned. */
+	idxoid = get_constraint_index(constr_id);
+
+	/* Find the leaf partition if needed. */
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		leaf_pk_rel = find_leaf_pk_rel(pk_rel, riinfo,
+									   pk_vals, pk_nulls,
+									   idxoid, &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			return false;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+
+	idxrel = index_open(idxoid, RowShareLock);
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+
+	/* Set up ScanKeys for the index scan. */
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					BTEqualStrategyNumber,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	/*
+	 * Start the scan. To make the changes of the current command visible to
+	 * the scan and for subsequent locking of the tuple (if any) found,
+	 * increment the command counter.
+	 */
+	CommandCounterIncrement();
+	PushActiveSnapshot(GetTransactionSnapshot());
+	scan = index_beginscan(pk_rel, idxrel, GetActiveSnapshot(), num_pk, 0);
+	outslot = table_slot_create(pk_rel, NULL);
+
+	found = false;
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Try to find the tuple */
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+		found = true;
+
+	/* Found tuple, try to lock it in the lockmode. */
+	if (found)
+	{
+		TM_FailureData tmfd;
+		TM_Result	res;
+		int			lockflags;
+
+		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+		if (!IsolationUsesXactSnapshot())
+			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+		res = table_tuple_lock(pk_rel, &(outslot->tts_tid), GetActiveSnapshot(),
+							   outslot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock,
+							   lockflags,
+							   &tmfd);
+
+		switch (res)
+		{
+			case TM_Ok:
+				break;
+
+			case TM_Updated:
+			case TM_Deleted:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				found = false;
+				break;
+
+			case TM_Invisible:
+				elog(ERROR, "attempted to lock invisible tuple");
+				break;
+
+			case TM_SelfModified:
+			case TM_BeingModified:
+			case TM_WouldBlock:
+				elog(ERROR, "unexpected table_tuple_lock status: %u", res);
+				break;
 
+			default:
+				elog(ERROR, "unrecognized table_tuple_lock status: %u", res);
+		}
+	}
+
+	PopActiveSnapshot();
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	return found;
+}
+
+/*
+ * Finds the leaf partition of the partitioned relation 'root_pk_rel' that
+ * might contain the specified unique key.
+ *
+ * Returns NULL if no such leaf partition is found.
+ *
+ * This works because the unique key defined on the root relation always
+ * contains the partition key columns of all ancestors leading up to a
+ * given leaf partition.
+ */
+static Relation
+find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid)
+{
+	Relation	pk_rel = root_pk_rel;
+	const AttrNumber *pk_attnums = riinfo->pk_attnums;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(pk_rel);
+		PartitionDesc partdesc = RelationGetPartitionDesc(pk_rel);
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (pk_rel != root_pk_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_pk_rel),
+														RelationGetDescr(pk_rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < riinfo->nkeys; k++)
+			{
+				if (root_partattrs[i] == pk_attnums[k])
+				{
+					partkey_vals[j] = pk_vals[k];
+					partkey_isnull[j] = (pk_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+
+		/* close any intermediate parents we opened */
+		if (pk_rel != root_pk_rel)
+			table_close(pk_rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		Assert(partidx < partdesc->nparts);
+		partoid = partdesc->oids[partidx];
+
+		pk_rel = table_open(partoid, RowShareLock);
+		constr_idxoid = index_get_partition(pk_rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		if (partdesc->is_leaf[partidx])
+		{
+			*leaf_idxoid = constr_idxoid;
+			return pk_rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
 
 /*
  * RI_FKey_check -
@@ -235,8 +570,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -316,9 +649,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -333,70 +666,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					false,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   RI_PLAN_CHECK_LOOKUPPK, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -451,81 +726,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -2181,9 +2385,9 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2193,33 +2397,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2296,9 +2473,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
@@ -2768,7 +2943,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2824,8 +3002,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
-- 
2.24.1

v5-0001-Export-get_partition_for_tuple.patchapplication/octet-stream; name=v5-0001-Export-get_partition_for_tuple.patchDownload

From d4dcaf85d9f2db9f68dae396599c52b501624ae3 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v5 1/2] Export get_partition_for_tuple()

Currently, only execPartition.c can see it, although a subsequent
change will require it to be callable from another module.  To make
this possible, also change the interface to accept the partitioning
information using more widely available structs.
---
 src/backend/executor/execPartition.c | 14 +++++++-------
 src/include/executor/execPartition.h |  3 +++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 1746cb8793..748a44f250 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,6 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -330,7 +328,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1309,13 +1309,13 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * Return value is index of the partition (>= 0 and < partdesc->nparts) if one
  * found or -1 if none found.
  */
-static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+int
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..e5888d54f1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,8 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 
 #endif							/* EXECPARTITION_H */
-- 
2.24.1

#20

Corey Huinker

corey.huinker@gmail.com

almost 5 years ago

In reply to: Amit Langote (#19)

Re: simplifying foreign key/RI checks

On Sun, Jan 24, 2021 at 6:51 AM Amit Langote <amitlangote09@gmail.com>
wrote:

On Sun, Jan 24, 2021 at 11:26 AM Corey Huinker <corey.huinker@gmail.com>
wrote:

On Sat, Jan 23, 2021 at 12:52 PM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,

Thanks for the review.
+       for (i = 0; i < riinfo->nkeys; i++)
+       {
+           Oid     eq_opr = eq_oprs[i];
+           Oid     typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+           RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr,
typeid);

+
+ if (pk_nulls[i] != 'n' &&

OidIsValid(entry->cast_func_finfo.fn_oid))

It seems the pk_nulls[i] != 'n' check can be lifted ahead of the

assignment to the three local variables. That way, ri_HashCompareOp
wouldn't be called when pk_nulls[i] == 'n'.

Good idea, so done. Although, there can't be nulls right now.
+           case TM_Updated:
+               if (IsolationUsesXactSnapshot())
...
+           case TM_Deleted:
+               if (IsolationUsesXactSnapshot())
It seems the handling for TM_Updated and TM_Deleted is the same. The
cases for these two values can be put next to each other (saving one block
of code).

Ah, yes. The TM_Updated case used to be handled a bit differently in
earlier unposted versions of the patch, though at some point I
concluded that the special handling was unnecessary, but didn't
realize what you just pointed out. Fixed.

I'll pause on reviewing v4 until you've addressed the suggestions above.

Here's v5.

v5 patches apply to master.
Suggested If/then optimization is implemented.
Suggested case merging is implemented.
Passes make check and make check-world yet again.
Just to confirm, we *don't* free the RI_CompareHashEntry because it points
to an entry in a hash table which is TopMemoryContext aka lifetime of the
session, correct?

Anybody else want to look this patch over before I mark it Ready For
Committer?

#21

Keisuke Kuroda

keisuke.kuroda.3862@gmail.com

almost 5 years ago

In reply to: Amit Langote (#12)

Re: simplifying foreign key/RI checks

Hi, Amit-san,

Nice patch. I have confirmed that this solves the problem in [1]/messages/by-id/cab4b85d-9292-967d-adf2-be0d803c3e23@nttcom.co.jp_1 with
INSERT/UPDATE.

This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also far less common, and whose solution will likely be very different.

Yeah, we should continue looking into the ways to make referenced-side
RI checks be less bloated.

However, as already mentioned, the problem of memory bloat on DELETE remains.
This can be solved by the patch in [1]/messages/by-id/cab4b85d-9292-967d-adf2-be0d803c3e23@nttcom.co.jp_1, but I think it is too much to apply
this patch only for DELETE. What do you think?

[1]: /messages/by-id/cab4b85d-9292-967d-adf2-be0d803c3e23@nttcom.co.jp_1

--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com

#22

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Corey Huinker (#20)

Re: simplifying foreign key/RI checks

On Mon, Jan 25, 2021 at 9:24 AM Corey Huinker <corey.huinker@gmail.com> wrote:

On Sun, Jan 24, 2021 at 6:51 AM Amit Langote <amitlangote09@gmail.com> wrote:

Here's v5.

v5 patches apply to master.
Suggested If/then optimization is implemented.
Suggested case merging is implemented.
Passes make check and make check-world yet again.
Just to confirm, we don't free the RI_CompareHashEntry because it points to an entry in a hash table which is TopMemoryContext aka lifetime of the session, correct?

Right.

Anybody else want to look this patch over before I mark it Ready For Committer?

Would be nice to have others look it over. Thanks.

--
Amit Langote
EDB: http://www.enterprisedb.com

#23

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Keisuke Kuroda (#21)

Re: simplifying foreign key/RI checks

Kuroda-san,

On Mon, Jan 25, 2021 at 6:06 PM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:

Hi, Amit-san,

Nice patch. I have confirmed that this solves the problem in [1] with
INSERT/UPDATE.

Thanks for testing.

HEAD + patch
name | bytes | pg_size_pretty
------------------+-------+----------------
CachedPlanQuery | 10280 | 10 kB
CachedPlanSource | 14616 | 14 kB
CachedPlan | 13168 | 13 kB ★ 710MB -> 13kB
(3 rows)

If you only tested insert/update on the referencing table, I would've
expected to see nothing in the result of that query, because the patch
eliminates all use of SPI in that case. I suspect the CachedPlan*
memory contexts you are seeing belong to some early activity in the
session. So if you try the insert/update in a freshly started
session, you would see 0 rows in the result of that query.

This patch completely sidesteps the DELETE case, which has more insidious performance implications, but is also far less common, and whose solution will likely be very different.

Yeah, we should continue looking into the ways to make referenced-side
RI checks be less bloated.

However, as already mentioned, the problem of memory bloat on DELETE remains.
This can be solved by the patch in [1], but I think it is too much to apply
this patch only for DELETE. What do you think?

[1] /messages/by-id/cab4b85d-9292-967d-adf2-be0d803c3e23@nttcom.co.jp_1

Hmm, the patch tries to solve a general problem that SPI plans are not
being shared among partitions whereas they should be. So I don't
think that it's necessarily specific to DELETE. Until we have a
solution like the patch on this thread for DELETE, it seems fine to
consider the other patch as a stopgap solution.

--
Amit Langote
EDB: http://www.enterprisedb.com

#24

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Amit Langote (#23)

Re: simplifying foreign key/RI checks

On Mon, Jan 25, 2021 at 7:01 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Jan 25, 2021 at 6:06 PM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:

However, as already mentioned, the problem of memory bloat on DELETE remains.
This can be solved by the patch in [1], but I think it is too much to apply
this patch only for DELETE. What do you think?

[1] /messages/by-id/cab4b85d-9292-967d-adf2-be0d803c3e23@nttcom.co.jp_1

Hmm, the patch tries to solve a general problem that SPI plans are not
being shared among partitions whereas they should be. So I don't
think that it's necessarily specific to DELETE. Until we have a
solution like the patch on this thread for DELETE, it seems fine to
consider the other patch as a stopgap solution.

Forgot to mention one thing. Alvaro, in his last email on that
thread, characterized that patch as fixing a bug, although I may have
misread that.

--
Amit Langote
EDB: http://www.enterprisedb.com

#25

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

almost 5 years ago

In reply to: Amit Langote (#22)

Re: simplifying foreign key/RI checks

Hi Amit-san,

On 2021/01/25 18:19, Amit Langote wrote:

On Mon, Jan 25, 2021 at 9:24 AM Corey Huinker <corey.huinker@gmail.com> wrote:

On Sun, Jan 24, 2021 at 6:51 AM Amit Langote <amitlangote09@gmail.com> wrote:

Here's v5.

v5 patches apply to master.
Suggested If/then optimization is implemented.
Suggested case merging is implemented.
Passes make check and make check-world yet again.
Just to confirm, we don't free the RI_CompareHashEntry because it points to an entry in a hash table which is TopMemoryContext aka lifetime of the session, correct?

Right.

Anybody else want to look this patch over before I mark it Ready For Committer?

Would be nice to have others look it over. Thanks.

Thanks for creating the patch!

I tried to review the patch. Here is my comment.

* According to this thread [1]/messages/by-id/92d6f545-5102-65d8-3c87-489f71ea0a37@enterprisedb.com, it might be better to replace elog() with
ereport() in the patch.

[1]: /messages/by-id/92d6f545-5102-65d8-3c87-489f71ea0a37@enterprisedb.com

Thanks,
Tatsuro Yamada

#26

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Tatsuro Yamada (#25)

Re: simplifying foreign key/RI checks

Yamada-san,

On Wed, Jan 27, 2021 at 8:51 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

On 2021/01/25 18:19, Amit Langote wrote:

On Mon, Jan 25, 2021 at 9:24 AM Corey Huinker <corey.huinker@gmail.com> wrote:

Anybody else want to look this patch over before I mark it Ready For Committer?

Would be nice to have others look it over. Thanks.

Thanks for creating the patch!

I tried to review the patch. Here is my comment.

Thanks for the comment.

* According to this thread [1], it might be better to replace elog() with
ereport() in the patch.

[1]: /messages/by-id/92d6f545-5102-65d8-3c87-489f71ea0a37@enterprisedb.com

Could you please tell which elog() of the following added by the patch
you are concerned about?

+           case TM_Invisible:
+               elog(ERROR, "attempted to lock invisible tuple");
+               break;
+
+           case TM_SelfModified:
+           case TM_BeingModified:
+           case TM_WouldBlock:
+               elog(ERROR, "unexpected table_tuple_lock status: %u", res);
+               break;

+           default:
+               elog(ERROR, "unrecognized table_tuple_lock status: %u", res);

All of these are meant as debugging elog()s for cases that won't
normally occur. IIUC, the discussion at the linked thread excludes
those from consideration.

--
Amit Langote
EDB: http://www.enterprisedb.com

#27

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

almost 5 years ago

In reply to: Amit Langote (#26)

Re: simplifying foreign key/RI checks

Hi Amit-san,

+           case TM_Invisible:
+               elog(ERROR, "attempted to lock invisible tuple");
+               break;
+
+           case TM_SelfModified:
+           case TM_BeingModified:
+           case TM_WouldBlock:
+               elog(ERROR, "unexpected table_tuple_lock status: %u", res);
+               break;

+           default:
+               elog(ERROR, "unrecognized table_tuple_lock status: %u", res);

All of these are meant as debugging elog()s for cases that won't
normally occur. IIUC, the discussion at the linked thread excludes
those from consideration.

Thanks for your explanation.
Ah, I reread the thread, and I now realized that user visible log messages
are the target to replace. I understood that that elog() for the cases won't
normally occur. Sorry for the noise.

Regards,
Tatsuro Yamada

#28

Keisuke Kuroda

keisuke.kuroda.3862@gmail.com

almost 5 years ago

In reply to: Amit Langote (#23)

Re: simplifying foreign key/RI checks

Hi Amit-san,

Thanks for the answer!

If you only tested insert/update on the referencing table, I would've
expected to see nothing in the result of that query, because the patch
eliminates all use of SPI in that case. I suspect the CachedPlan*
memory contexts you are seeing belong to some early activity in the
session. So if you try the insert/update in a freshly started
session, you would see 0 rows in the result of that query.

That's right.
CREATE PARTITION TABLE included in the test script(rep.sql) was using SPI.
In a new session, I confirmed that CachedPlan is not generated when only
execute INSERT.

# only execute INSERT

postgres=# INSERT INTO ps SELECT generate_series(1,4999);
INSERT 0 4999
postgres=#
postgres=# INSERT INTO pr SELECT i, i from generate_series(1,4999)i;
INSERT 0 4999

postgres=# SELECT name, sum(used_bytes) as bytes,
pg_size_pretty(sum(used_bytes)) FROM pg_backend_memory_contexts
WHERE name LIKE 'Cached%' GROUP BY name;

name | bytes | pg_size_pretty
------+-------+----------------
(0 rows) ★ No CachedPlan

Hmm, the patch tries to solve a general problem that SPI plans are not
being shared among partitions whereas they should be. So I don't
think that it's necessarily specific to DELETE. Until we have a
solution like the patch on this thread for DELETE, it seems fine to
consider the other patch as a stopgap solution.

I see.
So this is a solution to the problem of using SPI plans in partitions,
not just DELETE.
I agree with you, I think this is a solution to the current problem.

Best Regards,

--
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3862@gmail.com

#29

Kyotaro Horiguchi

horikyota.ntt@gmail.com

almost 5 years ago

In reply to: Zhihong Yu (#17)

Re: simplifying foreign key/RI checks

At Sun, 24 Jan 2021 20:51:39 +0900, Amit Langote <amitlangote09@gmail.com> wrote in

Here's v5.

At Mon, 25 Jan 2021 18:19:56 +0900, Amit Langote <amitlangote09@gmail.com> wrote in

Anybody else want to look this patch over before I mark it Ready For Committer?

Would be nice to have others look it over. Thanks.

This nice improvement.

0001 just looks fine.

0002:

 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
+/*
+ * 1 and 2  are no longer used, because PK (referenced) table is looked up
+ * directly using ri_ReferencedKeyExists().
 #define RI_PLAN_CHECK_LOOKUPPK			1
 #define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
 #define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK

However, this patch does.

+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   RI_PLAN_CHECK_LOOKUPPK, false);

It seems to me 1 (RI_PLAN_CHECK_LOOKUPPK) is still alive. (Yeah, I
know that doesn't mean the usefulness of the macro but the mechanism
the macro suggests, but it is confusing.) On the other hand,
RI_PLAN_CHECK_LOOKUPPK_FROM_PK and RI_PLAN_LAST_ON_PK seem to be no
longer used. (Couldn't we remove them?)

(about the latter, we can rewrite the only use of it "if
(qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)" not to use the macro.)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Import Notes

Reply to msg id not found: CA+HiwqG1qQuBwApueaUfA855UJ4TiSgFkPF34hQDWx3tOChV5w@mail.gmail.comCA+HiwqFPXVSozQBW6LBsW8jek9Z4i4M-aYVEb59CLHjUh5Xtpg@mail.gmail.com

#30

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Kyotaro Horiguchi (#29)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Wed, Jan 27, 2021 at 5:32 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Sun, 24 Jan 2021 20:51:39 +0900, Amit Langote <amitlangote09@gmail.com> wrote in

Here's v5.

At Mon, 25 Jan 2021 18:19:56 +0900, Amit Langote <amitlangote09@gmail.com> wrote in

Anybody else want to look this patch over before I mark it Ready For Committer?

Would be nice to have others look it over. Thanks.

This nice improvement.

0001 just looks fine.

0002:
/* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
+/*
+ * 1 and 2  are no longer used, because PK (referenced) table is looked up
+ * directly using ri_ReferencedKeyExists().
#define RI_PLAN_CHECK_LOOKUPPK                 1
#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK 2
#define RI_PLAN_LAST_ON_PK                             RI_PLAN_CHECK_LOOKUPPK_FROM_PK
However, this patch does.
+       if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+               ri_ReportViolation(riinfo,
+                                                  pk_rel, fk_rel,
+                                                  newslot,
+                                                  NULL,
+                                                  RI_PLAN_CHECK_LOOKUPPK, false);
It seems to me 1 (RI_PLAN_CHECK_LOOKUPPK) is still alive. (Yeah, I
know that doesn't mean the usefulness of the macro but the mechanism
the macro suggests, but it is confusing.) On the other hand,
RI_PLAN_CHECK_LOOKUPPK_FROM_PK and RI_PLAN_LAST_ON_PK seem to be no
longer used. (Couldn't we remove them?)

Yeah, better to just remove those _PK macros and say this module no
longer runs any queries on the PK table.

How about the attached?

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v6-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v6-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From 78b9a8ef04b647e95c8a595a245c96038e53da70 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 10 Dec 2020 20:21:29 +0900
Subject: [PATCH v6 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.
---
 src/backend/utils/adt/ri_triggers.c | 596 ++++++++++++++++++----------
 1 file changed, 380 insertions(+), 216 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6e3a41062f..581e283fda 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,16 +74,11 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
-#define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
-/* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_DEL_DODELETE	3
-#define RI_PLAN_CASCADE_UPD_DOUPDATE	4
-#define RI_PLAN_RESTRICT_CHECKREF		5
-#define RI_PLAN_SETNULL_DOUPDATE		6
-#define RI_PLAN_SETDEFAULT_DOUPDATE		7
+#define RI_PLAN_CASCADE_DEL_DODELETE	1
+#define RI_PLAN_CASCADE_UPD_DOUPDATE	2
+#define RI_PLAN_RESTRICT_CHECKREF		3
+#define RI_PLAN_SETNULL_DOUPDATE		4
+#define RI_PLAN_SETDEFAULT_DOUPDATE		5
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -220,8 +221,334 @@ static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-							   int queryno, bool partgone) pg_attribute_noreturn();
+							   bool on_fk, bool partgone) pg_attribute_noreturn();
+static Relation find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid);
 
+/*
+ * Checks whether a tuple containing the same unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/* Open the constraint index to be scanned. */
+	idxoid = get_constraint_index(constr_id);
+
+	/* Find the leaf partition if needed. */
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		leaf_pk_rel = find_leaf_pk_rel(pk_rel, riinfo,
+									   pk_vals, pk_nulls,
+									   idxoid, &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			return false;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+
+	idxrel = index_open(idxoid, RowShareLock);
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+
+	/* Set up ScanKeys for the index scan. */
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					BTEqualStrategyNumber,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	/*
+	 * Start the scan. To make the changes of the current command visible to
+	 * the scan and for subsequent locking of the tuple (if any) found,
+	 * increment the command counter.
+	 */
+	CommandCounterIncrement();
+	PushActiveSnapshot(GetTransactionSnapshot());
+	scan = index_beginscan(pk_rel, idxrel, GetActiveSnapshot(), num_pk, 0);
+	outslot = table_slot_create(pk_rel, NULL);
+
+	found = false;
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Try to find the tuple */
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+		found = true;
+
+	/* Found tuple, try to lock it in the lockmode. */
+	if (found)
+	{
+		TM_FailureData tmfd;
+		TM_Result	res;
+		int			lockflags;
+
+		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+		if (!IsolationUsesXactSnapshot())
+			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+		res = table_tuple_lock(pk_rel, &(outslot->tts_tid), GetActiveSnapshot(),
+							   outslot,
+							   GetCurrentCommandId(false),
+							   LockTupleKeyShare,
+							   LockWaitBlock,
+							   lockflags,
+							   &tmfd);
+
+		switch (res)
+		{
+			case TM_Ok:
+				break;
+
+			case TM_Updated:
+			case TM_Deleted:
+				if (IsolationUsesXactSnapshot())
+					ereport(ERROR,
+							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+							 errmsg("could not serialize access due to concurrent update")));
+				found = false;
+				break;
+
+			case TM_Invisible:
+				elog(ERROR, "attempted to lock invisible tuple");
+				break;
+
+			case TM_SelfModified:
+			case TM_BeingModified:
+			case TM_WouldBlock:
+				elog(ERROR, "unexpected table_tuple_lock status: %u", res);
+				break;
+
+			default:
+				elog(ERROR, "unrecognized table_tuple_lock status: %u", res);
+		}
+	}
+
+	PopActiveSnapshot();
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	return found;
+}
+
+/*
+ * Finds the leaf partition of the partitioned relation 'root_pk_rel' that
+ * might contain the specified unique key.
+ *
+ * Returns NULL if no such leaf partition is found.
+ *
+ * This works because the unique key defined on the root relation always
+ * contains the partition key columns of all ancestors leading up to a
+ * given leaf partition.
+ */
+static Relation
+find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid)
+{
+	Relation	pk_rel = root_pk_rel;
+	const AttrNumber *pk_attnums = riinfo->pk_attnums;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(pk_rel);
+		PartitionDesc partdesc = RelationGetPartitionDesc(pk_rel);
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (pk_rel != root_pk_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_pk_rel),
+														RelationGetDescr(pk_rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < riinfo->nkeys; k++)
+			{
+				if (root_partattrs[i] == pk_attnums[k])
+				{
+					partkey_vals[j] = pk_vals[k];
+					partkey_isnull[j] = (pk_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+
+		/* close any intermediate parents we opened */
+		if (pk_rel != root_pk_rel)
+			table_close(pk_rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		Assert(partidx < partdesc->nparts);
+		partoid = partdesc->oids[partidx];
+
+		pk_rel = table_open(partoid, RowShareLock);
+		constr_idxoid = index_get_partition(pk_rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		if (partdesc->is_leaf[partidx])
+		{
+			*leaf_idxoid = constr_idxoid;
+			return pk_rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
 
 /*
  * RI_FKey_check -
@@ -235,8 +562,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -316,9 +641,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -333,70 +658,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					false,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   true, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -451,81 +718,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -1541,15 +1737,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 					 errtableconstraint(fk_rel,
 										NameStr(fake_riinfo.conname))));
 
-		/*
-		 * We tell ri_ReportViolation we were doing the RI_PLAN_CHECK_LOOKUPPK
-		 * query, which isn't true, but will cause it to use
-		 * fake_riinfo.fk_attnums as we need.
-		 */
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   true, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1766,7 +1957,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 			fake_riinfo.pk_attnums[i] = i + 1;
 
 		ri_ReportViolation(&fake_riinfo, pk_rel, fk_rel,
-						   slot, tupdesc, 0, true);
+						   slot, tupdesc, true, true);
 	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
@@ -2136,19 +2327,11 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
-	Relation	query_rel;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel;
 	Oid			save_userid;
 	int			save_sec_context;
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
@@ -2181,9 +2364,10 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2193,33 +2377,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2296,14 +2453,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
-						   qkey->constr_queryno, false);
+						   false, false);
 
 	return SPI_processed != 0;
 }
@@ -2334,9 +2489,9 @@ ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 /*
  * Produce an error report
  *
- * If the failed constraint was on insert/update to the FK table,
- * we want the key names and values extracted from there, and the error
- * message to look like 'key blah is not present in PK'.
+ * If the failed constraint was on insert/update to the FK table (on_fk is
+ * true), we want the key names and values extracted from there, and the
+ * error message to look like 'key blah is not present in PK'.
  * Otherwise, the attr names and values come from the PK table and the
  * message looks like 'key blah is still referenced from FK'.
  */
@@ -2344,22 +2499,20 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 				   Relation pk_rel, Relation fk_rel,
 				   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-				   int queryno, bool partgone)
+				   bool on_fk, bool partgone)
 {
 	StringInfoData key_names;
 	StringInfoData key_values;
-	bool		onfk;
 	const int16 *attnums;
 	Oid			rel_oid;
 	AclResult	aclresult;
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * If tupdesc wasn't passed by caller, assume the violator tuple came from
+	 * there.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
-	if (onfk)
+	if (on_fk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
@@ -2461,7 +2614,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 						   key_names.data, key_values.data,
 						   RelationGetRelationName(fk_rel)),
 				 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
-	else if (onfk)
+	else if (on_fk)
 		ereport(ERROR,
 				(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 				 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -2768,7 +2921,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2824,8 +2980,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
-- 
2.24.1

v6-0001-Export-get_partition_for_tuple.patchapplication/octet-stream; name=v6-0001-Export-get_partition_for_tuple.patchDownload

From 535bc10fb43617ebcd939546d27fef8b5bd44565 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v6 1/2] Export get_partition_for_tuple()

Currently, only execPartition.c can see it, although a subsequent
change will require it to be callable from another module.  To make
this possible, also change the interface to accept the partitioning
information using more widely available structs.
---
 src/backend/executor/execPartition.c | 14 +++++++-------
 src/include/executor/execPartition.h |  3 +++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 1746cb8793..748a44f250 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,6 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -330,7 +328,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1309,13 +1309,13 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * Return value is index of the partition (>= 0 and < partdesc->nparts) if one
  * found or -1 if none found.
  */
-static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+int
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..e5888d54f1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,8 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 
 #endif							/* EXECPARTITION_H */
-- 
2.24.1

#31

Corey Huinker

corey.huinker@gmail.com

almost 5 years ago

In reply to: Amit Langote (#30)

Re: simplifying foreign key/RI checks

It seems to me 1 (RI_PLAN_CHECK_LOOKUPPK) is still alive. (Yeah, I
know that doesn't mean the usefulness of the macro but the mechanism
the macro suggests, but it is confusing.) On the other hand,
RI_PLAN_CHECK_LOOKUPPK_FROM_PK and RI_PLAN_LAST_ON_PK seem to be no
longer used. (Couldn't we remove them?)

Yeah, better to just remove those _PK macros and say this module no
longer runs any queries on the PK table.

How about the attached?

Sorry for the delay.
I see that the changes were made as described.
Passes make check and make check-world yet again.
I'm marking this Ready For Committer unless someone objects.

#32

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Corey Huinker (#31)

Re: simplifying foreign key/RI checks

On Mon, Mar 1, 2021 at 3:14 PM Corey Huinker <corey.huinker@gmail.com> wrote:

It seems to me 1 (RI_PLAN_CHECK_LOOKUPPK) is still alive. (Yeah, I
know that doesn't mean the usefulness of the macro but the mechanism
the macro suggests, but it is confusing.) On the other hand,
RI_PLAN_CHECK_LOOKUPPK_FROM_PK and RI_PLAN_LAST_ON_PK seem to be no
longer used. (Couldn't we remove them?)

Yeah, better to just remove those _PK macros and say this module no
longer runs any queries on the PK table.

How about the attached?

Sorry for the delay.
I see that the changes were made as described.
Passes make check and make check-world yet again.
I'm marking this Ready For Committer unless someone objects.

Thank you Corey for the review.

--
Amit Langote
EDB: http://www.enterprisedb.com

#33

houzj.fnst@fujitsu.com

almost 5 years ago

In reply to: Amit Langote (#32)

RE: simplifying foreign key/RI checks

Hi amit,

(sorry about not cc the hacker list)
I have an issue about command id here.
It's probably not directly related to your patch, so I am sorry if it bothers you.

+	/*
+	 * Start the scan. To make the changes of the current command visible to
+	 * the scan and for subsequent locking of the tuple (if any) found,
+	 * increment the command counter.
+	 */
+	CommandCounterIncrement();

For insert on fk relation, is it necessary to create new command id every time ?
I think it is only necessary when it modifies the referenced table.
for example: 1) has modifyingcte
2) has modifying function(trigger/domain...)

All of the above seems not supported in parallel mode(parallel unsafe).
So I was wondering if we can avoid the CommandCounterIncrement in parallel mode.

Best regards,
houzj

#34

Tom Lane

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Amit Langote (#32)

Re: simplifying foreign key/RI checks

I took a quick look at this. I guess I'm disturbed by the idea
that we'd totally replace the implementation technology for only one
variant of foreign key checks. That means that there'll be a lot
of minor details that don't act the same depending on context. One
point I was just reminded of by [1]/messages/by-id/16911-ca792f6bbe244754@postgresql.org is that the SPI approach enforces
permissions checks on the table access, which I do not see being done
anywhere in your patch. Now, maybe it's fine not to have such checks,
on the grounds that the existence of the RI constraint is sufficient
permission (the creator had to have REFERENCES permission to make it).
But I'm not sure about that. Should we add SELECT permissions checks
to this code path to make it less different?

In the same vein, the existing code actually runs the query as the
table owner (cf. SetUserIdAndSecContext in ri_PerformCheck), another
nicety you haven't bothered with. Maybe that is invisible for a
pure SELECT query but I'm not sure I would bet on it. At the very
least you're betting that the index-related operators you invoke
aren't going to care, and that nobody is going to try to use this
difference to create a security exploit via a trojan-horse index.

Shall we mention RLS restrictions? If we don't worry about that,
I think REFERENCES privilege becomes a full bypass of RLS, at
least for unique-key columns.

I wonder also what happens if the referenced table isn't a plain
heap with a plain btree index. Maybe you're accessing it at the
right level of abstraction so things will just work with some
other access methods, but I'm not sure about that. (Anybody
want to try this with a partitioned table some of whose partitions
are foreign tables?)

Lastly, ri_PerformCheck is pretty careful about not only which
snapshot it uses, but which *pair* of snapshots it uses, because
sometimes it needs to worry about data changes since the start
of the transaction. You've ignored all of that complexity AFAICS.
That's okay (I think) for RI_FKey_check which was passing
detectNewRows = false, but for sure it's not okay for
ri_Check_Pk_Match. (I kind of thought we had isolation tests
that would catch that, but apparently not.)

So, this is a cute idea, and the speedup is pretty impressive,
but I don't think it's anywhere near committable. I also wonder
whether we really want ri_triggers.c having its own copy of
low-level stuff like the tuple-locking code you copied. Seems
like a likely maintenance hazard, so maybe some more refactoring
is needed.

regards, tom lane

[1]: /messages/by-id/16911-ca792f6bbe244754@postgresql.org

#35

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Tom Lane (#34)

Re: simplifying foreign key/RI checks

On Thu, Mar 4, 2021 at 5:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I took a quick look at this.

Thanks a lot for the review.

I guess I'm disturbed by the idea
that we'd totally replace the implementation technology for only one
variant of foreign key checks. That means that there'll be a lot
of minor details that don't act the same depending on context. One
point I was just reminded of by [1] is that the SPI approach enforces
permissions checks on the table access, which I do not see being done
anywhere in your patch. Now, maybe it's fine not to have such checks,
on the grounds that the existence of the RI constraint is sufficient
permission (the creator had to have REFERENCES permission to make it).
But I'm not sure about that. Should we add SELECT permissions checks
to this code path to make it less different?

In the same vein, the existing code actually runs the query as the
table owner (cf. SetUserIdAndSecContext in ri_PerformCheck), another
nicety you haven't bothered with. Maybe that is invisible for a
pure SELECT query but I'm not sure I would bet on it. At the very
least you're betting that the index-related operators you invoke
aren't going to care, and that nobody is going to try to use this
difference to create a security exploit via a trojan-horse index.

How about we do at the top of ri_ReferencedKeyExists() what
ri_PerformCheck() always does before executing a query, which is this:

/* Switch to proper UID to perform check as */
GetUserIdAndSecContext(&save_userid, &save_sec_context);
SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
SECURITY_NOFORCE_RLS);

And then also check the permissions of the switched user on the scan
target relation's schema (ACL_USAGE) and the relation itself
(ACL_SELECT).

IOW, this:

+   Oid         save_userid;
+   int         save_sec_context;
+   AclResult   aclresult;
+
+   /* Switch to proper UID to perform check as */
+   GetUserIdAndSecContext(&save_userid, &save_sec_context);
+   SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+                          save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+                          SECURITY_NOFORCE_RLS);
+
+   /* Check namespace permissions. */
+   aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+                                     GetUserId(), ACL_USAGE);
+   if (aclresult != ACLCHECK_OK)
+       aclcheck_error(aclresult, OBJECT_SCHEMA,
+                      get_namespace_name(RelationGetNamespace(pk_rel)));
+   /* Check the user has SELECT permissions on the referenced relation. */
+   aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+                                 ACL_SELECT);
+   if (aclresult != ACLCHECK_OK)
+       aclcheck_error(aclresult, OBJECT_TABLE,
+                      RelationGetRelationName(pk_rel));

/*
* Extract the unique key from the provided slot and choose the equality
@@ -414,6 +436,9 @@ ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
index_endscan(scan);
ExecDropSingleTupleTableSlot(outslot);

+   /* Restore UID and security context */
+   SetUserIdAndSecContext(save_userid, save_sec_context);
+
    /* Don't release lock until commit. */
    index_close(idxrel, NoLock);

Shall we mention RLS restrictions? If we don't worry about that,
I think REFERENCES privilege becomes a full bypass of RLS, at
least for unique-key columns.

Seeing what check_enable_rls() does when running under the security
context set by ri_PerformCheck(), it indeed seems that RLS is bypassed
when executing these RI queries. The following comment in
check_enable_rls() seems to say so:

* InNoForceRLSOperation indicates that we should not apply RLS even
* if the table has FORCE RLS set - IF the current user is the owner.
* This is specifically to ensure that referential integrity checks
* are able to still run correctly.

I wonder also what happens if the referenced table isn't a plain
heap with a plain btree index. Maybe you're accessing it at the
right level of abstraction so things will just work with some
other access methods, but I'm not sure about that.

I believe that I've made ri_ReferencedKeyExists() use the appropriate
APIs to scan the index, lock the returned table tuple, etc., but do
you think we might be better served by introducing a new set of APIs
for this use case?

(Anybody
want to try this with a partitioned table some of whose partitions
are foreign tables?)

Partitioned tables with foreign table partitions cannot be referenced
in a foreign key, so cannot appear in this function. That's because
unique constraints are not allowed when there are foreign table
partitions.

Lastly, ri_PerformCheck is pretty careful about not only which
snapshot it uses, but which *pair* of snapshots it uses, because
sometimes it needs to worry about data changes since the start
of the transaction. You've ignored all of that complexity AFAICS.
That's okay (I think) for RI_FKey_check which was passing
detectNewRows = false, but for sure it's not okay for
ri_Check_Pk_Match. (I kind of thought we had isolation tests
that would catch that, but apparently not.)

Okay, let me closely check the ri_Check_Pk_Match() case and see if
there's any live bug.

So, this is a cute idea, and the speedup is pretty impressive,
but I don't think it's anywhere near committable. I also wonder
whether we really want ri_triggers.c having its own copy of
low-level stuff like the tuple-locking code you copied. Seems
like a likely maintenance hazard, so maybe some more refactoring
is needed.

Okay, I will see if there's a way to avoid copying too much code.

--
Amit Langote
EDB: http://www.enterprisedb.com

#36

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Amit Langote (#35)

Re: simplifying foreign key/RI checks

On Mon, Mar 8, 2021 at 11:41 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Mar 4, 2021 at 5:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Lastly, ri_PerformCheck is pretty careful about not only which
snapshot it uses, but which *pair* of snapshots it uses, because
sometimes it needs to worry about data changes since the start
of the transaction. You've ignored all of that complexity AFAICS.
That's okay (I think) for RI_FKey_check which was passing
detectNewRows = false, but for sure it's not okay for
ri_Check_Pk_Match. (I kind of thought we had isolation tests
that would catch that, but apparently not.)

Okay, let me closely check the ri_Check_Pk_Match() case and see if
there's any live bug.

I checked, and AFAICS, the query invoked by ri_Check_Pk_Match() (that
is, without the patch) does not use the "crosscheck" snapshot at any
point during its execution. That snapshot is only used in the
table_update() and table_delete() routines, which are not involved in
the execution of ri_Check_Pk_Match()'s query.

I dug through git history and -hackers archives to understand the
origins of RI code's use of a crosscheck snapshot and came across this
discussion:

/messages/by-id/20031001150510.U45145@megazone.bigpanda.com

If I am reading the discussion and the details in subsequent commit
55d85f42a891a correctly, the crosscheck snapshot is only to be used to
ensure, under serializable isolation, that any attempts by the RI
query of updating/deleting rows that are not visible to the
transaction snapshot cause a serialization error. Use of the same
facilities in ri_Check_Pk_Match() was merely done as future-proofing,
with no particular use case to address, then and perhaps even now.

If that is indeed the case, it does not seem particularly incorrect
for ri_ReferencedKeyExists() added by the patch to not bother with
setting up a crosscheck snapshot, even when called from
ri_Check_Pk_Match(). Am I missing something?

--
Amit Langote
EDB: http://www.enterprisedb.com

#37

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Amit Langote (#35)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Mon, Mar 8, 2021 at 11:41 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Thu, Mar 4, 2021 at 5:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I guess I'm disturbed by the idea
that we'd totally replace the implementation technology for only one
variant of foreign key checks. That means that there'll be a lot
of minor details that don't act the same depending on context. One
point I was just reminded of by [1] is that the SPI approach enforces
permissions checks on the table access, which I do not see being done
anywhere in your patch. Now, maybe it's fine not to have such checks,
on the grounds that the existence of the RI constraint is sufficient
permission (the creator had to have REFERENCES permission to make it).
But I'm not sure about that. Should we add SELECT permissions checks
to this code path to make it less different?

In the same vein, the existing code actually runs the query as the
table owner (cf. SetUserIdAndSecContext in ri_PerformCheck), another
nicety you haven't bothered with. Maybe that is invisible for a
pure SELECT query but I'm not sure I would bet on it. At the very
least you're betting that the index-related operators you invoke
aren't going to care, and that nobody is going to try to use this
difference to create a security exploit via a trojan-horse index.

How about we do at the top of ri_ReferencedKeyExists() what
ri_PerformCheck() always does before executing a query, which is this:

/* Switch to proper UID to perform check as */
GetUserIdAndSecContext(&save_userid, &save_sec_context);
SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
SECURITY_NOFORCE_RLS);

And then also check the permissions of the switched user on the scan
target relation's schema (ACL_USAGE) and the relation itself
(ACL_SELECT).

IOW, this:
+   Oid         save_userid;
+   int         save_sec_context;
+   AclResult   aclresult;
+
+   /* Switch to proper UID to perform check as */
+   GetUserIdAndSecContext(&save_userid, &save_sec_context);
+   SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+                          save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+                          SECURITY_NOFORCE_RLS);
+
+   /* Check namespace permissions. */
+   aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+                                     GetUserId(), ACL_USAGE);
+   if (aclresult != ACLCHECK_OK)
+       aclcheck_error(aclresult, OBJECT_SCHEMA,
+                      get_namespace_name(RelationGetNamespace(pk_rel)));
+   /* Check the user has SELECT permissions on the referenced relation. */
+   aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+                                 ACL_SELECT);
+   if (aclresult != ACLCHECK_OK)
+       aclcheck_error(aclresult, OBJECT_TABLE,
+                      RelationGetRelationName(pk_rel));
/*
* Extract the unique key from the provided slot and choose the equality
@@ -414,6 +436,9 @@ ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
index_endscan(scan);
ExecDropSingleTupleTableSlot(outslot);
+   /* Restore UID and security context */
+   SetUserIdAndSecContext(save_userid, save_sec_context);
+
/* Don't release lock until commit. */
index_close(idxrel, NoLock);

I've included these changes in the updated patch.

Shall we mention RLS restrictions? If we don't worry about that,
I think REFERENCES privilege becomes a full bypass of RLS, at
least for unique-key columns.

Seeing what check_enable_rls() does when running under the security
context set by ri_PerformCheck(), it indeed seems that RLS is bypassed
when executing these RI queries. The following comment in
check_enable_rls() seems to say so:

* InNoForceRLSOperation indicates that we should not apply RLS even
* if the table has FORCE RLS set - IF the current user is the owner.
* This is specifically to ensure that referential integrity checks
* are able to still run correctly.

I've added a comment to note that the new way of "selecting" the
referenced tuple effectively bypasses RLS, as is the case when
selecting via SPI.

I wonder also what happens if the referenced table isn't a plain
heap with a plain btree index. Maybe you're accessing it at the
right level of abstraction so things will just work with some
other access methods, but I'm not sure about that.

I believe that I've made ri_ReferencedKeyExists() use the appropriate
APIs to scan the index, lock the returned table tuple, etc., but do
you think we might be better served by introducing a new set of APIs
for this use case?

I concur that by using the interfaces defined in genam.h and
tableam.h, patch accounts for cases involving other access methods.

That said, I had overlooked one bit in the new code that is specific
to btree AM, which is the hard-coding of BTEqualStrategyNumber in the
following:

/* Initialize the scankey. */
ScanKeyInit(&skey[i],
pkattno,
BTEqualStrategyNumber,
regop,
pk_vals[i]);

In the updated patch, I've added code to look up the index-specific
strategy number to pass here.

Lastly, ri_PerformCheck is pretty careful about not only which
snapshot it uses, but which *pair* of snapshots it uses, because
sometimes it needs to worry about data changes since the start
of the transaction. You've ignored all of that complexity AFAICS.
That's okay (I think) for RI_FKey_check which was passing
detectNewRows = false, but for sure it's not okay for
ri_Check_Pk_Match. (I kind of thought we had isolation tests
that would catch that, but apparently not.)

Okay, let me closely check the ri_Check_Pk_Match() case and see if
there's any live bug.

As mentioned in my earlier reply, there doesn't seem to be a need for
ri_Check_Pk_Match() to set the crosscheck snapshot as it is basically
unused.

So, this is a cute idea, and the speedup is pretty impressive,
but I don't think it's anywhere near committable. I also wonder
whether we really want ri_triggers.c having its own copy of
low-level stuff like the tuple-locking code you copied. Seems
like a likely maintenance hazard, so maybe some more refactoring
is needed.

Okay, I will see if there's a way to avoid copying too much code.

I thought sharing the tuple-locking code with ExecLockRows(), which
seemed closest in semantics to what the new code is doing, might not
be such a bad idea, but not sure I came up with a great interface for
the shared function. Actually, there are other places having their
own copies of tuple-locking logic, but they deal with the locking
result in their own unique ways, so I didn't get excited about finding
a way to make the new function accommodate their needs. I also admit
that I may have totally misunderstood what refactoring you were
referring to in your comment.

Updated patches attached. Sorry about the delay.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v7-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/x-patch; name=v7-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From 2a13b6d04800521fcd523846186cabe9cdeea2a4 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 10 Dec 2020 20:21:29 +0900
Subject: [PATCH v7 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.
---
 src/backend/executor/nodeLockRows.c | 161 ++++----
 src/backend/utils/adt/ri_triggers.c | 616 ++++++++++++++++++----------
 src/include/executor/executor.h     |   9 +
 3 files changed, 488 insertions(+), 298 deletions(-)

diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index b2e5c30079..ae07465127 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -75,11 +75,9 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
+		bool		skip;
 
 		/* clear any leftover test tuple for this rel */
 		markSlot = EvalPlanQualSlot(&node->lr_epqstate, erm->relation, erm->rti);
@@ -175,74 +173,12 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
 
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		skip = !ExecLockTableTuple(erm->relation, &tid, markSlot,
+								   estate->es_snapshot, estate->es_output_cid,
+								   lockmode, erm->waitPolicy, &epq_needed);
+		if (skip)
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -277,6 +213,91 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with given TID with given lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = 0;
+	TM_Result	test;
+
+	lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 09a2ad2881..93b994dc2b 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,16 +74,11 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
-#define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
-/* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_DEL_DODELETE	3
-#define RI_PLAN_CASCADE_UPD_DOUPDATE	4
-#define RI_PLAN_RESTRICT_CHECKREF		5
-#define RI_PLAN_SETNULL_DOUPDATE		6
-#define RI_PLAN_SETDEFAULT_DOUPDATE		7
+#define RI_PLAN_CASCADE_DEL_DODELETE	1
+#define RI_PLAN_CASCADE_UPD_DOUPDATE	2
+#define RI_PLAN_RESTRICT_CHECKREF		3
+#define RI_PLAN_SETNULL_DOUPDATE		4
+#define RI_PLAN_SETDEFAULT_DOUPDATE		5
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -224,8 +225,331 @@ static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-							   int queryno, bool partgone) pg_attribute_noreturn();
+							   bool on_fk, bool partgone) pg_attribute_noreturn();
+static Relation find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid);
 
+/*
+ * Checks whether a tuple containing the same unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found = false;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	Oid			save_userid;
+	int			save_sec_context;
+	AclResult	aclresult;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Switch to referenced table's owner to perform the below operations
+	 * as.  This matches what ri_PerformCheck() does.
+	 *
+	 * Note that as with queries done by ri_PerformCheck(), the way we select
+	 * the referenced row below effectively bypasses any RLS policies that may
+	 * be present on the referenced table.
+	 */
+	GetUserIdAndSecContext(&save_userid, &save_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE);
+
+	/*
+	 * Also check that the new user has permissions to look into the schema
+	 * of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_constraint_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		leaf_pk_rel = find_leaf_pk_rel(pk_rel, riinfo,
+									   pk_vals, pk_nulls,
+									   idxoid, &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			goto done;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	/*
+	 * Start the scan. To make the changes of the current command visible to
+	 * the scan and for subsequent locking of the tuple (if any) found,
+	 * increment the command counter.
+	 */
+	CommandCounterIncrement();
+	PushActiveSnapshot(GetTransactionSnapshot());
+	scan = index_beginscan(pk_rel, idxrel, GetActiveSnapshot(), num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Try to find the tuple */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+		found = true;
+
+	/* Found tuple, try to lock it in key share mode. */
+	if (found)
+		found = ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+								   GetActiveSnapshot(),
+								   GetCurrentCommandId(false),
+								   LockTupleKeyShare,
+								   LockWaitBlock, NULL);
+
+	PopActiveSnapshot();
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+done:
+	/* Restore UID and security context */
+	SetUserIdAndSecContext(save_userid, save_sec_context);
+
+	return found;
+}
+
+/*
+ * Finds the leaf partition of the partitioned relation 'root_pk_rel' that
+ * might contain the specified unique key.
+ *
+ * Returns NULL if no such leaf partition is found.
+ *
+ * This works because the unique key defined on the root relation always
+ * contains the partition key columns of all ancestors leading up to a
+ * given leaf partition.
+ */
+static Relation
+find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid)
+{
+	Relation	pk_rel = root_pk_rel;
+	const AttrNumber *pk_attnums = riinfo->pk_attnums;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(pk_rel);
+		PartitionDesc partdesc = RelationGetPartitionDesc(pk_rel);
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (pk_rel != root_pk_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_pk_rel),
+														RelationGetDescr(pk_rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < riinfo->nkeys; k++)
+			{
+				if (root_partattrs[i] == pk_attnums[k])
+				{
+					partkey_vals[j] = pk_vals[k];
+					partkey_isnull[j] = (pk_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+
+		/* close any intermediate parents we opened */
+		if (pk_rel != root_pk_rel)
+			table_close(pk_rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		Assert(partidx < partdesc->nparts);
+		partoid = partdesc->oids[partidx];
+
+		pk_rel = table_open(partoid, RowShareLock);
+		constr_idxoid = index_get_partition(pk_rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		if (partdesc->is_leaf[partidx])
+		{
+			*leaf_idxoid = constr_idxoid;
+			return pk_rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
 
 /*
  * RI_FKey_check -
@@ -239,8 +563,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -320,9 +642,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -337,70 +659,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					false,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   true, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -455,81 +719,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -1545,15 +1738,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 					 errtableconstraint(fk_rel,
 										NameStr(fake_riinfo.conname))));
 
-		/*
-		 * We tell ri_ReportViolation we were doing the RI_PLAN_CHECK_LOOKUPPK
-		 * query, which isn't true, but will cause it to use
-		 * fake_riinfo.fk_attnums as we need.
-		 */
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   true, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1770,7 +1958,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 			fake_riinfo.pk_attnums[i] = i + 1;
 
 		ri_ReportViolation(&fake_riinfo, pk_rel, fk_rel,
-						   slot, tupdesc, 0, true);
+						   slot, tupdesc, true, true);
 	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
@@ -1907,26 +2095,25 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 {
 	/*
 	 * Inherited constraints with a common ancestor can share ri_query_cache
-	 * entries for all query types except RI_PLAN_CHECK_LOOKUPPK_FROM_PK.
-	 * Except in that case, the query processes the other table involved in
-	 * the FK constraint (i.e., not the table on which the trigger has been
-	 * fired), and so it will be the same for all members of the inheritance
-	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * entries, because each query processes the other table involved in the
+	 * FK constraint (i.e., not the table on which the trigger has been fired),
+	 * and so it will be the same for all members of the inheritance tree.  So
+	 * we may use the root constraint's OID in the hash key, rather than the
+	 * constraint's own OID.  This avoids creating duplicate SPI plans, saving
+	 * lots of work and memory when there are many partitions with similar FK
+	 * constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
 	 *
+	 * (Note also that for a standalone or non-inherited constraint,
+	 * constraint_root_id is same as constraint_id.)
+	 *
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
-		key->constr_id = riinfo->constraint_root_id;
-	else
-		key->constr_id = riinfo->constraint_id;
+	key->constr_id = riinfo->constraint_root_id;
 	key->constr_queryno = constr_queryno;
 }
 
@@ -2195,19 +2382,11 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
-	Relation	query_rel;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel;
 	Oid			save_userid;
 	int			save_sec_context;
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
@@ -2240,9 +2419,10 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2252,33 +2432,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2355,14 +2508,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
-						   qkey->constr_queryno, false);
+						   false, false);
 
 	return SPI_processed != 0;
 }
@@ -2393,9 +2544,9 @@ ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 /*
  * Produce an error report
  *
- * If the failed constraint was on insert/update to the FK table,
- * we want the key names and values extracted from there, and the error
- * message to look like 'key blah is not present in PK'.
+ * If the failed constraint was on insert/update to the FK table (on_fk is
+ * true), we want the key names and values extracted from there, and the
+ * error message to look like 'key blah is not present in PK'.
  * Otherwise, the attr names and values come from the PK table and the
  * message looks like 'key blah is still referenced from FK'.
  */
@@ -2403,22 +2554,20 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 				   Relation pk_rel, Relation fk_rel,
 				   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-				   int queryno, bool partgone)
+				   bool on_fk, bool partgone)
 {
 	StringInfoData key_names;
 	StringInfoData key_values;
-	bool		onfk;
 	const int16 *attnums;
 	Oid			rel_oid;
 	AclResult	aclresult;
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * If tupdesc wasn't passed by caller, assume the violator tuple came from
+	 * there.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
-	if (onfk)
+	if (on_fk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
@@ -2520,7 +2669,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 						   key_names.data, key_values.data,
 						   RelationGetRelationName(fk_rel)),
 				 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
-	else if (onfk)
+	else if (on_fk)
 		ereport(ERROR,
 				(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 				 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -2827,7 +2976,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2883,8 +3035,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 071e363d54..2da52e7ba5 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -231,6 +231,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
-- 
2.24.1

v7-0001-Export-get_partition_for_tuple.patchapplication/x-patch; name=v7-0001-Export-get_partition_for_tuple.patchDownload

From 489133ced948bcd6892ee7ac22f2b13962d5ee8d Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v7 1/2] Export get_partition_for_tuple()

Currently, only execPartition.c can see it, although a subsequent
change will require it to be callable from another module.  To make
this possible, also change the interface to accept the partitioning
information using more widely available structs.
---
 src/backend/executor/execPartition.c | 14 +++++++-------
 src/include/executor/execPartition.h |  3 +++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b8da4c5967..f6323d3c8d 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -183,8 +183,6 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -331,7 +329,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1311,13 +1311,13 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * Return value is index of the partition (>= 0 and < partdesc->nparts) if one
  * found or -1 if none found.
  */
-static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+int
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..e5888d54f1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,8 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 
 #endif							/* EXECPARTITION_H */
-- 
2.24.1

#38

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Amit Langote (#37)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Sat, Mar 20, 2021 at 10:21 PM Amit Langote <amitlangote09@gmail.com> wrote:

Updated patches attached. Sorry about the delay.

Rebased over the recent DETACH PARTITION CONCURRENTLY work.
Apparently, ri_ReferencedKeyExists() was using the wrong snapshot.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v8-0001-Export-get_partition_for_tuple.patchapplication/octet-stream; name=v8-0001-Export-get_partition_for_tuple.patchDownload

From d250c5a18ce975f72368bbf9a2331bd2bb5a8a71 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v8 1/2] Export get_partition_for_tuple()

Currently, only execPartition.c can see it, although a subsequent
change will require it to be callable from another module.  To make
this possible, also change the interface to accept the partitioning
information using more widely available structs.
---
 src/backend/executor/execPartition.c | 14 +++++++-------
 src/include/executor/execPartition.h |  3 +++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 558060e080..1d9012cde3 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -183,8 +183,6 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -331,7 +329,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1324,13 +1324,13 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * Return value is index of the partition (>= 0 and < partdesc->nparts) if one
  * found or -1 if none found.
  */
-static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+int
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..e5888d54f1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,8 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 
 #endif							/* EXECPARTITION_H */
-- 
2.24.1

v8-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v8-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From 90d15115d6afe05f5c6b9c0831481dc0ce0d6404 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 10 Dec 2020 20:21:29 +0900
Subject: [PATCH v8 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.
---
 src/backend/executor/nodeLockRows.c | 161 +++----
 src/backend/utils/adt/ri_triggers.c | 630 ++++++++++++++++++----------
 src/include/executor/executor.h     |   9 +
 3 files changed, 498 insertions(+), 302 deletions(-)

diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index b2e5c30079..ae07465127 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -75,11 +75,9 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
+		bool		skip;
 
 		/* clear any leftover test tuple for this rel */
 		markSlot = EvalPlanQualSlot(&node->lr_epqstate, erm->relation, erm->rti);
@@ -175,74 +173,12 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
 
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		skip = !ExecLockTableTuple(erm->relation, &tid, markSlot,
+								   estate->es_snapshot, estate->es_output_cid,
+								   lockmode, erm->waitPolicy, &epq_needed);
+		if (skip)
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -277,6 +213,91 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with given TID with given lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = 0;
+	TM_Result	test;
+
+	lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 7c77c338ce..c75c36c93b 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,16 +74,11 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
-#define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
-/* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_DEL_DODELETE	3
-#define RI_PLAN_CASCADE_UPD_DOUPDATE	4
-#define RI_PLAN_RESTRICT_CHECKREF		5
-#define RI_PLAN_SETNULL_DOUPDATE		6
-#define RI_PLAN_SETDEFAULT_DOUPDATE		7
+#define RI_PLAN_CASCADE_DEL_DODELETE	1
+#define RI_PLAN_CASCADE_UPD_DOUPDATE	2
+#define RI_PLAN_RESTRICT_CHECKREF		3
+#define RI_PLAN_SETNULL_DOUPDATE		4
+#define RI_PLAN_SETDEFAULT_DOUPDATE		5
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -224,8 +225,341 @@ static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-							   int queryno, bool partgone) pg_attribute_noreturn();
+							   bool on_fk, bool partgone) pg_attribute_noreturn();
+static Relation find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid);
 
+/*
+ * Checks whether a tuple containing the same unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found = false;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	Snapshot	snap;
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	Oid			save_userid;
+	int			save_sec_context;
+	AclResult	aclresult;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Switch to referenced table's owner to perform the below operations
+	 * as.  This matches what ri_PerformCheck() does.
+	 *
+	 * Note that as with queries done by ri_PerformCheck(), the way we select
+	 * the referenced row below effectively bypasses any RLS policies that may
+	 * be present on the referenced table.
+	 */
+	GetUserIdAndSecContext(&save_userid, &save_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE);
+
+	/*
+	 * Also check that the new user has permissions to look into the schema
+	 * of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * Perform any scans below with the latest snapshot to not miss any
+	 * live tuples, including any sysscans that may occur.  For example,
+	 * when find_leaf_pk_rel() reads pg_inherits via
+	 * RelationGetPartitionDesc(), it's crucial that we set the current
+	 * snapshot here so that any partitions whose detachment is pending
+	 * are excluded from being added to the list of partitions that may
+	 * be returned.  Also increment the command counter to make the
+	 * changes of the current command visible.
+	 */
+	CommandCounterIncrement();
+	snap = RegisterSnapshot(GetLatestSnapshot());
+	/* Set ActiveSnapshot so that any sysscans use this snapshot. */
+	PushActiveSnapshot(snap);
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_constraint_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		leaf_pk_rel = find_leaf_pk_rel(pk_rel, riinfo,
+									   pk_vals, pk_nulls,
+									   idxoid, &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+			goto done;
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, snap, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Try to find the tuple */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+		found = true;
+
+	/* Found tuple, try to lock it in key share mode. */
+	if (found)
+		found = ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+								   snap,
+								   GetCurrentCommandId(false),
+								   LockTupleKeyShare,
+								   LockWaitBlock, NULL);
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+done:
+	/* Restore UID and security context */
+	SetUserIdAndSecContext(save_userid, save_sec_context);
+	PopActiveSnapshot();
+	UnregisterSnapshot(snap);
+
+	return found;
+}
+
+/*
+ * Finds the leaf partition of the partitioned relation 'root_pk_rel' that
+ * might contain the specified unique key.
+ *
+ * Returns NULL if no such leaf partition is found.
+ *
+ * This works because the unique key defined on the root relation always
+ * contains the partition key columns of all ancestors leading up to a
+ * given leaf partition.
+ */
+static Relation
+find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid)
+{
+	Relation	pk_rel = root_pk_rel;
+	const AttrNumber *pk_attnums = riinfo->pk_attnums;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(pk_rel);
+		PartitionDesc partdesc = RelationGetPartitionDesc(pk_rel, false);
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (pk_rel != root_pk_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_pk_rel),
+														RelationGetDescr(pk_rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < riinfo->nkeys; k++)
+			{
+				if (root_partattrs[i] == pk_attnums[k])
+				{
+					partkey_vals[j] = pk_vals[k];
+					partkey_isnull[j] = (pk_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+
+		/* close any intermediate parents we opened */
+		if (pk_rel != root_pk_rel)
+			table_close(pk_rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		Assert(partidx < partdesc->nparts);
+		partoid = partdesc->oids[partidx];
+
+		pk_rel = table_open(partoid, RowShareLock);
+		constr_idxoid = index_get_partition(pk_rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		if (partdesc->is_leaf[partidx])
+		{
+			*leaf_idxoid = constr_idxoid;
+			return pk_rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
 
 /*
  * RI_FKey_check -
@@ -239,8 +573,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -320,9 +652,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -337,74 +669,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh
-	 * in order for the hack in find_inheritance_children() to work.
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   true, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -459,81 +729,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -1549,15 +1748,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 					 errtableconstraint(fk_rel,
 										NameStr(fake_riinfo.conname))));
 
-		/*
-		 * We tell ri_ReportViolation we were doing the RI_PLAN_CHECK_LOOKUPPK
-		 * query, which isn't true, but will cause it to use
-		 * fake_riinfo.fk_attnums as we need.
-		 */
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   true, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1774,7 +1968,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 			fake_riinfo.pk_attnums[i] = i + 1;
 
 		ri_ReportViolation(&fake_riinfo, pk_rel, fk_rel,
-						   slot, tupdesc, 0, true);
+						   slot, tupdesc, true, true);
 	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
@@ -1911,26 +2105,25 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 {
 	/*
 	 * Inherited constraints with a common ancestor can share ri_query_cache
-	 * entries for all query types except RI_PLAN_CHECK_LOOKUPPK_FROM_PK.
-	 * Except in that case, the query processes the other table involved in
-	 * the FK constraint (i.e., not the table on which the trigger has been
-	 * fired), and so it will be the same for all members of the inheritance
-	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * entries, because each query processes the other table involved in the
+	 * FK constraint (i.e., not the table on which the trigger has been fired),
+	 * and so it will be the same for all members of the inheritance tree.  So
+	 * we may use the root constraint's OID in the hash key, rather than the
+	 * constraint's own OID.  This avoids creating duplicate SPI plans, saving
+	 * lots of work and memory when there are many partitions with similar FK
+	 * constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
 	 *
+	 * (Note also that for a standalone or non-inherited constraint,
+	 * constraint_root_id is same as constraint_id.)
+	 *
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
-		key->constr_id = riinfo->constraint_root_id;
-	else
-		key->constr_id = riinfo->constraint_id;
+	key->constr_id = riinfo->constraint_root_id;
 	key->constr_queryno = constr_queryno;
 }
 
@@ -2199,19 +2392,11 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
-	Relation	query_rel;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel;
 	Oid			save_userid;
 	int			save_sec_context;
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
@@ -2244,9 +2429,10 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2256,33 +2442,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2359,14 +2518,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
-						   qkey->constr_queryno, false);
+						   false, false);
 
 	return SPI_processed != 0;
 }
@@ -2397,9 +2554,9 @@ ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 /*
  * Produce an error report
  *
- * If the failed constraint was on insert/update to the FK table,
- * we want the key names and values extracted from there, and the error
- * message to look like 'key blah is not present in PK'.
+ * If the failed constraint was on insert/update to the FK table (on_fk is
+ * true), we want the key names and values extracted from there, and the
+ * error message to look like 'key blah is not present in PK'.
  * Otherwise, the attr names and values come from the PK table and the
  * message looks like 'key blah is still referenced from FK'.
  */
@@ -2407,22 +2564,20 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 				   Relation pk_rel, Relation fk_rel,
 				   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-				   int queryno, bool partgone)
+				   bool on_fk, bool partgone)
 {
 	StringInfoData key_names;
 	StringInfoData key_values;
-	bool		onfk;
 	const int16 *attnums;
 	Oid			rel_oid;
 	AclResult	aclresult;
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * If tupdesc wasn't passed by caller, assume the violator tuple came from
+	 * there.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
-	if (onfk)
+	if (on_fk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
@@ -2524,7 +2679,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 						   key_names.data, key_values.data,
 						   RelationGetRelationName(fk_rel)),
 				 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
-	else if (onfk)
+	else if (on_fk)
 		ereport(ERROR,
 				(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 				 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -2831,7 +2986,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2887,8 +3045,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 26dcc4485e..5454410c30 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -241,6 +241,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
-- 
2.24.1

#39

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: Amit Langote (#38)

Re: simplifying foreign key/RI checks

Hi,

+       skip = !ExecLockTableTuple(erm->relation, &tid, markSlot,
+                                  estate->es_snapshot,
estate->es_output_cid,
+                                  lockmode, erm->waitPolicy, &epq_needed);
+       if (skip)

It seems the variable skip is only used above. The variable is not needed -
if statement can directly check the return value.

+ * Locks tuple with given TID with given lockmode following given
wait

given appears three times in the above sentence. Maybe the following is bit
easier to read:

Locks tuple with the specified TID, lockmode following given wait policy

+ * Checks whether a tuple containing the same unique key as extracted from
the
+ * tuple provided in 'slot' exists in 'pk_rel'.

I think 'same' is not needed here since the remaining part of the sentence
has adequately identified the key.

+ if (leaf_pk_rel == NULL)
+ goto done;

It would be better to avoid goto by including the cleanup statements in the
if block and return.

+   if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+       found = true;
+
+   /* Found tuple, try to lock it in key share mode. */
+   if (found)

Since found is only assigned in one place, the two if statements can be
combined into one.

Cheers

On Fri, Apr 2, 2021 at 5:46 AM Amit Langote <amitlangote09@gmail.com> wrote:

Show quoted text

On Sat, Mar 20, 2021 at 10:21 PM Amit Langote <amitlangote09@gmail.com>
wrote:

Updated patches attached. Sorry about the delay.

Rebased over the recent DETACH PARTITION CONCURRENTLY work.
Apparently, ri_ReferencedKeyExists() was using the wrong snapshot.

--
Amit Langote
EDB: http://www.enterprisedb.com

#40

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 5 years ago

In reply to: Amit Langote (#38)

Re: simplifying foreign key/RI checks

On 2021-Apr-02, Amit Langote wrote:

On Sat, Mar 20, 2021 at 10:21 PM Amit Langote <amitlangote09@gmail.com> wrote:

Updated patches attached. Sorry about the delay.

Rebased over the recent DETACH PARTITION CONCURRENTLY work.
Apparently, ri_ReferencedKeyExists() was using the wrong snapshot.

Hmm, I wonder if that stuff should be using a PartitionDirectory? (I
didn't actually understand what your code is doing, so please forgive if
this is a silly question.)

--
ï¿½lvaro Herrera 39ï¿½49'30"S 73ï¿½17'W
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php

#41

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Alvaro Herrera (#40)

2 attachment(s)

Re: simplifying foreign key/RI checks

Hi Alvaro,

On Sat, Apr 3, 2021 at 12:01 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Apr-02, Amit Langote wrote:

On Sat, Mar 20, 2021 at 10:21 PM Amit Langote <amitlangote09@gmail.com> wrote:

Updated patches attached. Sorry about the delay.

Rebased over the recent DETACH PARTITION CONCURRENTLY work.
Apparently, ri_ReferencedKeyExists() was using the wrong snapshot.

Hmm, I wonder if that stuff should be using a PartitionDirectory? (I
didn't actually understand what your code is doing, so please forgive if
this is a silly question.)

No problem, I wondered about that too when rebasing.

My instinct *was* that maybe there's no need for it, because
find_leaf_pk_rel()'s use of a PartitionDesc is pretty limited in
duration and scope of the kind of things it calls that there's no need
to worry about it getting invalidated while in use. But I may be
wrong about that, because get_partition_for_tuple() can call arbitrary
user-defined functions, which may result in invalidation messages
being processed and an unguarded PartitionDesc getting wiped out under
us.

So, I've added PartitionDirectory protection in find_leaf_pk_rel() in
the attached updated version.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v8-0001-Export-get_partition_for_tuple.patchapplication/octet-stream; name=v8-0001-Export-get_partition_for_tuple.patchDownload

From 0084d66c66ecf785c7019a0439846468dfd53138 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v8 1/2] Export get_partition_for_tuple()

Currently, only execPartition.c can see it, although a subsequent
change will require it to be callable from another module.  To make
this possible, also change the interface to accept the partitioning
information using more widely available structs.
---
 src/backend/executor/execPartition.c | 14 +++++++-------
 src/include/executor/execPartition.h |  3 +++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 558060e080..1d9012cde3 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -183,8 +183,6 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -331,7 +329,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1324,13 +1324,13 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * Return value is index of the partition (>= 0 and < partdesc->nparts) if one
  * found or -1 if none found.
  */
-static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+int
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index d30ffde7d9..e5888d54f1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -125,5 +125,8 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 
 #endif							/* EXECPARTITION_H */
-- 
2.24.1

v8-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v8-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From e0feac67d323553af3477af15b1032d8a4a2eba3 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 10 Dec 2020 20:21:29 +0900
Subject: [PATCH v8 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.
---
 src/backend/executor/nodeLockRows.c | 161 ++++---
 src/backend/utils/adt/ri_triggers.c | 641 ++++++++++++++++++----------
 src/include/executor/executor.h     |   9 +
 3 files changed, 508 insertions(+), 303 deletions(-)

diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index b2e5c30079..41530f634a 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -75,10 +75,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -175,74 +172,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -277,6 +211,91 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = 0;
+	TM_Result	test;
+
+	lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 7c77c338ce..36f66a5896 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,16 +74,11 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
-#define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
-/* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_DEL_DODELETE	3
-#define RI_PLAN_CASCADE_UPD_DOUPDATE	4
-#define RI_PLAN_RESTRICT_CHECKREF		5
-#define RI_PLAN_SETNULL_DOUPDATE		6
-#define RI_PLAN_SETDEFAULT_DOUPDATE		7
+#define RI_PLAN_CASCADE_DEL_DODELETE	1
+#define RI_PLAN_CASCADE_UPD_DOUPDATE	2
+#define RI_PLAN_RESTRICT_CHECKREF		3
+#define RI_PLAN_SETNULL_DOUPDATE		4
+#define RI_PLAN_SETDEFAULT_DOUPDATE		5
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -224,8 +225,352 @@ static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-							   int queryno, bool partgone) pg_attribute_noreturn();
+							   bool on_fk, bool partgone) pg_attribute_noreturn();
+static Relation find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid);
 
+/*
+ * Checks whether a tuple containing the unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found = false;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	Snapshot	snap;
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	Oid			save_userid;
+	int			save_sec_context;
+	AclResult	aclresult;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Switch to referenced table's owner to perform the below operations
+	 * as.  This matches what ri_PerformCheck() does.
+	 *
+	 * Note that as with queries done by ri_PerformCheck(), the way we select
+	 * the referenced row below effectively bypasses any RLS policies that may
+	 * be present on the referenced table.
+	 */
+	GetUserIdAndSecContext(&save_userid, &save_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE);
+
+	/*
+	 * Also check that the new user has permissions to look into the schema
+	 * of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * To avoid missing any live tuples, perform any scans below with the
+	 * latest snapshot, including any sysscans that may occur.  Also increment
+	 * the command counter to make the changes of the current command visible.
+	 */
+	CommandCounterIncrement();
+	snap = RegisterSnapshot(GetLatestSnapshot());
+	/* Set ActiveSnapshot so that any sysscans use this snapshot. */
+	PushActiveSnapshot(snap);
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_constraint_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		leaf_pk_rel = find_leaf_pk_rel(pk_rel, riinfo,
+									   pk_vals, pk_nulls,
+									   idxoid, &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+		{
+			SetUserIdAndSecContext(save_userid, save_sec_context);
+			PopActiveSnapshot();
+			UnregisterSnapshot(snap);
+			return false;
+		}
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, snap, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		found = ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+								   snap,
+								   GetCurrentCommandId(false),
+								   LockTupleKeyShare,
+								   LockWaitBlock, NULL);
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/* Restore UID and security context */
+	SetUserIdAndSecContext(save_userid, save_sec_context);
+	PopActiveSnapshot();
+	UnregisterSnapshot(snap);
+
+	return found;
+}
+
+/*
+ * Finds the leaf partition of the partitioned relation 'root_pk_rel' that
+ * might contain the specified unique key.
+ *
+ * Returns NULL if no such leaf partition is found.
+ *
+ * This works because the unique key defined on the root relation always
+ * contains the partition key columns of all ancestors leading up to a
+ * given leaf partition.
+ */
+static Relation
+find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid)
+{
+	Relation	pk_rel = root_pk_rel;
+	const AttrNumber *pk_attnums = riinfo->pk_attnums;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(pk_rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (pk_rel != root_pk_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_pk_rel),
+														RelationGetDescr(pk_rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < riinfo->nkeys; k++)
+			{
+				if (root_partattrs[i] == pk_attnums[k])
+				{
+					partkey_vals[j] = pk_vals[k];
+					partkey_isnull[j] = (pk_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+		partdesc = PartitionDirectoryLookup(partdir, pk_rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+		partoid = partdesc->oids[partidx];
+		is_leaf = partdesc->is_leaf[partidx];
+
+		/* done using the partition directory */
+		DestroyPartitionDirectory(partdir);
+
+		/* close any intermediate parents we opened */
+		if (pk_rel != root_pk_rel)
+			table_close(pk_rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		pk_rel = table_open(partoid, RowShareLock);
+		constr_idxoid = index_get_partition(pk_rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return pk_rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
 
 /*
  * RI_FKey_check -
@@ -239,8 +584,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -320,9 +663,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -337,74 +680,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh
-	 * in order for the hack in find_inheritance_children() to work.
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   true, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -459,81 +740,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -1549,15 +1759,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 					 errtableconstraint(fk_rel,
 										NameStr(fake_riinfo.conname))));
 
-		/*
-		 * We tell ri_ReportViolation we were doing the RI_PLAN_CHECK_LOOKUPPK
-		 * query, which isn't true, but will cause it to use
-		 * fake_riinfo.fk_attnums as we need.
-		 */
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   true, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1774,7 +1979,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 			fake_riinfo.pk_attnums[i] = i + 1;
 
 		ri_ReportViolation(&fake_riinfo, pk_rel, fk_rel,
-						   slot, tupdesc, 0, true);
+						   slot, tupdesc, true, true);
 	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
@@ -1911,26 +2116,25 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 {
 	/*
 	 * Inherited constraints with a common ancestor can share ri_query_cache
-	 * entries for all query types except RI_PLAN_CHECK_LOOKUPPK_FROM_PK.
-	 * Except in that case, the query processes the other table involved in
-	 * the FK constraint (i.e., not the table on which the trigger has been
-	 * fired), and so it will be the same for all members of the inheritance
-	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * entries, because each query processes the other table involved in the
+	 * FK constraint (i.e., not the table on which the trigger has been fired),
+	 * and so it will be the same for all members of the inheritance tree.  So
+	 * we may use the root constraint's OID in the hash key, rather than the
+	 * constraint's own OID.  This avoids creating duplicate SPI plans, saving
+	 * lots of work and memory when there are many partitions with similar FK
+	 * constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
 	 *
+	 * (Note also that for a standalone or non-inherited constraint,
+	 * constraint_root_id is same as constraint_id.)
+	 *
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
-		key->constr_id = riinfo->constraint_root_id;
-	else
-		key->constr_id = riinfo->constraint_id;
+	key->constr_id = riinfo->constraint_root_id;
 	key->constr_queryno = constr_queryno;
 }
 
@@ -2199,19 +2403,11 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
-	Relation	query_rel;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel;
 	Oid			save_userid;
 	int			save_sec_context;
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
@@ -2244,9 +2440,10 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2256,33 +2453,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2359,14 +2529,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
-						   qkey->constr_queryno, false);
+						   false, false);
 
 	return SPI_processed != 0;
 }
@@ -2397,9 +2565,9 @@ ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 /*
  * Produce an error report
  *
- * If the failed constraint was on insert/update to the FK table,
- * we want the key names and values extracted from there, and the error
- * message to look like 'key blah is not present in PK'.
+ * If the failed constraint was on insert/update to the FK table (on_fk is
+ * true), we want the key names and values extracted from there, and the
+ * error message to look like 'key blah is not present in PK'.
  * Otherwise, the attr names and values come from the PK table and the
  * message looks like 'key blah is still referenced from FK'.
  */
@@ -2407,22 +2575,20 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 				   Relation pk_rel, Relation fk_rel,
 				   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-				   int queryno, bool partgone)
+				   bool on_fk, bool partgone)
 {
 	StringInfoData key_names;
 	StringInfoData key_values;
-	bool		onfk;
 	const int16 *attnums;
 	Oid			rel_oid;
 	AclResult	aclresult;
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * If tupdesc wasn't passed by caller, assume the violator tuple came from
+	 * there.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
-	if (onfk)
+	if (on_fk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
@@ -2524,7 +2690,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 						   key_names.data, key_values.data,
 						   RelationGetRelationName(fk_rel)),
 				 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
-	else if (onfk)
+	else if (on_fk)
 		ereport(ERROR,
 				(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 				 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -2831,7 +2997,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2887,8 +3056,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 26dcc4485e..5454410c30 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -241,6 +241,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
-- 
2.24.1

#42

Amit Langote

amitlangote09@gmail.com

almost 5 years ago

In reply to: Zhihong Yu (#39)

Re: simplifying foreign key/RI checks

On Fri, Apr 2, 2021 at 11:55 PM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
+       skip = !ExecLockTableTuple(erm->relation, &tid, markSlot,
+                                  estate->es_snapshot, estate->es_output_cid,
+                                  lockmode, erm->waitPolicy, &epq_needed);
+       if (skip)
It seems the variable skip is only used above. The variable is not needed - if statement can directly check the return value.

+ * Locks tuple with given TID with given lockmode following given wait

given appears three times in the above sentence. Maybe the following is bit easier to read:

Locks tuple with the specified TID, lockmode following given wait policy
+ * Checks whether a tuple containing the same unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.
I think 'same' is not needed here since the remaining part of the sentence has adequately identified the key.

+ if (leaf_pk_rel == NULL)
+ goto done;

It would be better to avoid goto by including the cleanup statements in the if block and return.
+   if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+       found = true;
+
+   /* Found tuple, try to lock it in key share mode. */
+   if (found)
Since found is only assigned in one place, the two if statements can be combined into one.

Thanks for taking a look. I agree with most of your suggestions and
have incorporated them in the v8 just posted.

--
Amit Langote
EDB: http://www.enterprisedb.com

#43

vignesh C

vignesh21@gmail.com

over 4 years ago

In reply to: Amit Langote (#42)

Re: simplifying foreign key/RI checks

On Sun, Apr 4, 2021 at 1:51 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Fri, Apr 2, 2021 at 11:55 PM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
+       skip = !ExecLockTableTuple(erm->relation, &tid, markSlot,
+                                  estate->es_snapshot, estate->es_output_cid,
+                                  lockmode, erm->waitPolicy, &epq_needed);
+       if (skip)
It seems the variable skip is only used above. The variable is not needed - if statement can directly check the return value.

+ * Locks tuple with given TID with given lockmode following given wait

given appears three times in the above sentence. Maybe the following is bit easier to read:

Locks tuple with the specified TID, lockmode following given wait policy
+ * Checks whether a tuple containing the same unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.
I think 'same' is not needed here since the remaining part of the sentence has adequately identified the key.

+ if (leaf_pk_rel == NULL)
+ goto done;

It would be better to avoid goto by including the cleanup statements in the if block and return.
+   if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+       found = true;
+
+   /* Found tuple, try to lock it in key share mode. */
+   if (found)
Since found is only assigned in one place, the two if statements can be combined into one.
Thanks for taking a look. I agree with most of your suggestions and
have incorporated them in the v8 just posted.

The 2nd patch does not apply on Head, please post a rebased version:
error: patch failed: src/backend/utils/adt/ri_triggers.c:337
error: src/backend/utils/adt/ri_triggers.c: patch does not apply

Regards,
Vignesh

#44

Amit Langote

amitlangote09@gmail.com

over 4 years ago

In reply to: vignesh C (#43)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Tue, Jul 6, 2021 at 1:56 AM vignesh C <vignesh21@gmail.com> wrote:

The 2nd patch does not apply on Head, please post a rebased version:
error: patch failed: src/backend/utils/adt/ri_triggers.c:337
error: src/backend/utils/adt/ri_triggers.c: patch does not apply

Thanks for the heads up.

Rebased patches attached.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v9-0001-Export-get_partition_for_tuple.patchapplication/octet-stream; name=v9-0001-Export-get_partition_for_tuple.patchDownload

From a0163dba3e5b8527d9f06549966b854477fccdab Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v9 1/2] Export get_partition_for_tuple()

Currently, only execPartition.c can see it, although a subsequent
change will require it to be callable from another module.  To make
this possible, also change the interface to accept the partitioning
information using more widely available structs.
---
 src/backend/executor/execPartition.c | 14 +++++++-------
 src/include/executor/execPartition.h |  3 +++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 606c920b06..3275f22964 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -175,8 +175,6 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -310,7 +308,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1239,13 +1239,13 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * Return value is index of the partition (>= 0 and < partdesc->nparts) if one
  * found or -1 if none found.
  */
-static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+int
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 694e38b7dd..243d764866 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,5 +124,8 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 
 #endif							/* EXECPARTITION_H */
-- 
2.24.1

v9-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v9-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From 0b0c4dca3525b7c1058b0b109c36f8e5c0a83d23 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 10 Dec 2020 20:21:29 +0900
Subject: [PATCH v9 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.
---
 src/backend/executor/nodeLockRows.c | 161 ++++---
 src/backend/utils/adt/ri_triggers.c | 641 ++++++++++++++++++----------
 src/include/executor/executor.h     |   9 +
 3 files changed, 508 insertions(+), 303 deletions(-)

diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 7583973f4a..77ac776e27 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,91 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = 0;
+	TM_Result	test;
+
+	lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 96269fc2ad..fce4b27a99 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,16 +74,11 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
-#define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
-/* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_DEL_DODELETE	3
-#define RI_PLAN_CASCADE_UPD_DOUPDATE	4
-#define RI_PLAN_RESTRICT_CHECKREF		5
-#define RI_PLAN_SETNULL_DOUPDATE		6
-#define RI_PLAN_SETDEFAULT_DOUPDATE		7
+#define RI_PLAN_CASCADE_DEL_DODELETE	1
+#define RI_PLAN_CASCADE_UPD_DOUPDATE	2
+#define RI_PLAN_RESTRICT_CHECKREF		3
+#define RI_PLAN_SETNULL_DOUPDATE		4
+#define RI_PLAN_SETDEFAULT_DOUPDATE		5
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -224,8 +225,352 @@ static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-							   int queryno, bool partgone) pg_attribute_noreturn();
+							   bool on_fk, bool partgone) pg_attribute_noreturn();
+static Relation find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid);
 
+/*
+ * Checks whether a tuple containing the unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found = false;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	Snapshot	snap;
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	Oid			save_userid;
+	int			save_sec_context;
+	AclResult	aclresult;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Switch to referenced table's owner to perform the below operations
+	 * as.  This matches what ri_PerformCheck() does.
+	 *
+	 * Note that as with queries done by ri_PerformCheck(), the way we select
+	 * the referenced row below effectively bypasses any RLS policies that may
+	 * be present on the referenced table.
+	 */
+	GetUserIdAndSecContext(&save_userid, &save_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE);
+
+	/*
+	 * Also check that the new user has permissions to look into the schema
+	 * of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * To avoid missing any live tuples, perform any scans below with the
+	 * latest snapshot, including any sysscans that may occur.  Also increment
+	 * the command counter to make the changes of the current command visible.
+	 */
+	CommandCounterIncrement();
+	snap = RegisterSnapshot(GetLatestSnapshot());
+	/* Set ActiveSnapshot so that any sysscans use this snapshot. */
+	PushActiveSnapshot(snap);
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_constraint_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+
+		leaf_pk_rel = find_leaf_pk_rel(pk_rel, riinfo,
+									   pk_vals, pk_nulls,
+									   idxoid, &leaf_idxoid);
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+		{
+			SetUserIdAndSecContext(save_userid, save_sec_context);
+			PopActiveSnapshot();
+			UnregisterSnapshot(snap);
+			return false;
+		}
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, snap, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		found = ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+								   snap,
+								   GetCurrentCommandId(false),
+								   LockTupleKeyShare,
+								   LockWaitBlock, NULL);
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/* Restore UID and security context */
+	SetUserIdAndSecContext(save_userid, save_sec_context);
+	PopActiveSnapshot();
+	UnregisterSnapshot(snap);
+
+	return found;
+}
+
+/*
+ * Finds the leaf partition of the partitioned relation 'root_pk_rel' that
+ * might contain the specified unique key.
+ *
+ * Returns NULL if no such leaf partition is found.
+ *
+ * This works because the unique key defined on the root relation always
+ * contains the partition key columns of all ancestors leading up to a
+ * given leaf partition.
+ */
+static Relation
+find_leaf_pk_rel(Relation root_pk_rel, const RI_ConstraintInfo *riinfo,
+				 Datum *pk_vals, char *pk_nulls,
+				 Oid root_idxoid, Oid *leaf_idxoid)
+{
+	Relation	pk_rel = root_pk_rel;
+	const AttrNumber *pk_attnums = riinfo->pk_attnums;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(pk_rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (pk_rel != root_pk_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_pk_rel),
+														RelationGetDescr(pk_rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < riinfo->nkeys; k++)
+			{
+				if (root_partattrs[i] == pk_attnums[k])
+				{
+					partkey_vals[j] = pk_vals[k];
+					partkey_isnull[j] = (pk_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, pk_rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+		partoid = partdesc->oids[partidx];
+		is_leaf = partdesc->is_leaf[partidx];
+
+		/* done using the partition directory */
+		DestroyPartitionDirectory(partdir);
+
+		/* close any intermediate parents we opened */
+		if (pk_rel != root_pk_rel)
+			table_close(pk_rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		pk_rel = table_open(partoid, RowShareLock);
+		constr_idxoid = index_get_partition(pk_rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return pk_rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
 
 /*
  * RI_FKey_check -
@@ -239,8 +584,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -320,9 +663,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -337,74 +680,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   true, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -459,81 +740,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -1549,15 +1759,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 					 errtableconstraint(fk_rel,
 										NameStr(fake_riinfo.conname))));
 
-		/*
-		 * We tell ri_ReportViolation we were doing the RI_PLAN_CHECK_LOOKUPPK
-		 * query, which isn't true, but will cause it to use
-		 * fake_riinfo.fk_attnums as we need.
-		 */
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   true, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1774,7 +1979,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 			fake_riinfo.pk_attnums[i] = i + 1;
 
 		ri_ReportViolation(&fake_riinfo, pk_rel, fk_rel,
-						   slot, tupdesc, 0, true);
+						   slot, tupdesc, true, true);
 	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
@@ -1911,26 +2116,25 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 {
 	/*
 	 * Inherited constraints with a common ancestor can share ri_query_cache
-	 * entries for all query types except RI_PLAN_CHECK_LOOKUPPK_FROM_PK.
-	 * Except in that case, the query processes the other table involved in
-	 * the FK constraint (i.e., not the table on which the trigger has been
-	 * fired), and so it will be the same for all members of the inheritance
-	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * entries, because each query processes the other table involved in the
+	 * FK constraint (i.e., not the table on which the trigger has been fired),
+	 * and so it will be the same for all members of the inheritance tree.  So
+	 * we may use the root constraint's OID in the hash key, rather than the
+	 * constraint's own OID.  This avoids creating duplicate SPI plans, saving
+	 * lots of work and memory when there are many partitions with similar FK
+	 * constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
 	 *
+	 * (Note also that for a standalone or non-inherited constraint,
+	 * constraint_root_id is same as constraint_id.)
+	 *
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
-		key->constr_id = riinfo->constraint_root_id;
-	else
-		key->constr_id = riinfo->constraint_id;
+	key->constr_id = riinfo->constraint_root_id;
 	key->constr_queryno = constr_queryno;
 }
 
@@ -2199,19 +2403,11 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
-	Relation	query_rel;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel;
 	Oid			save_userid;
 	int			save_sec_context;
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
@@ -2244,9 +2440,10 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2256,33 +2453,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2359,14 +2529,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
-						   qkey->constr_queryno, false);
+						   false, false);
 
 	return SPI_processed != 0;
 }
@@ -2397,9 +2565,9 @@ ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 /*
  * Produce an error report
  *
- * If the failed constraint was on insert/update to the FK table,
- * we want the key names and values extracted from there, and the error
- * message to look like 'key blah is not present in PK'.
+ * If the failed constraint was on insert/update to the FK table (on_fk is
+ * true), we want the key names and values extracted from there, and the
+ * error message to look like 'key blah is not present in PK'.
  * Otherwise, the attr names and values come from the PK table and the
  * message looks like 'key blah is still referenced from FK'.
  */
@@ -2407,22 +2575,20 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 				   Relation pk_rel, Relation fk_rel,
 				   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-				   int queryno, bool partgone)
+				   bool on_fk, bool partgone)
 {
 	StringInfoData key_names;
 	StringInfoData key_values;
-	bool		onfk;
 	const int16 *attnums;
 	Oid			rel_oid;
 	AclResult	aclresult;
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * If tupdesc wasn't passed by caller, assume the violator tuple came from
+	 * there.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
-	if (onfk)
+	if (on_fk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
@@ -2524,7 +2690,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 						   key_names.data, key_values.data,
 						   RelationGetRelationName(fk_rel)),
 				 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
-	else if (onfk)
+	else if (on_fk)
 		ereport(ERROR,
 				(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 				 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -2831,7 +2997,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2887,8 +3056,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 3dc03c913e..8d30d8752c 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -241,6 +241,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
-- 
2.24.1

#45

Corey Huinker

corey.huinker@gmail.com

over 4 years ago

In reply to: Amit Langote (#44)

Re: simplifying foreign key/RI checks

Rebased patches attached.

I'm reviewing the changes since v6, which was my last review.

Making ExecLockTableTuple() it's own function makes sense.
Snapshots are now accounted for.
The changes that account for n-level partitioning makes sense as well.

Passes make check-world.
Not user facing, so no user documentation required.
Marking as ready for committer again.

#46

Tom Lane

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Amit Langote (#44)

Re: simplifying foreign key/RI checks

Amit Langote <amitlangote09@gmail.com> writes:

Rebased patches attached.

I've spent some more time digging into the snapshot-management angle.
I think you are right that the crosscheck_snapshot isn't really an
issue because the executor pays no attention to it for SELECT, but
that doesn't mean that there's no problem, because the test_snapshot
behavior is different too. By my reading of it, the intention of the
existing code is to insist that when IsolationUsesXactSnapshot()
is true and we *weren't* saying detectNewRows, the query should be
restricted to only see rows visible to the transaction snapshot.
Which I think is proper: an RR transaction shouldn't be allowed to
insert referencing rows that depend on a referenced row it can't see.
On the other hand, it's reasonable for ri_Check_Pk_Match to use
detectNewRows=true, because in that case what we're doing is allowing
an RR transaction to depend on the continued existence of a PK value
that was deleted and replaced since the start of its transaction.

It appears to me that commit 71f4c8c6f (DETACH PARTITION CONCURRENTLY)
broke the semantics here, because now things work differently with a
partitioned PK table than with a plain table, thanks to not bothering
to distinguish questions of how to handle partition detachment from
questions of visibility of individual data tuples. We evidently
haven't got test coverage for this :-(, which is perhaps not so
surprising because all this behavior long predates the isolationtester
infrastructure that would've allowed us to test it mechanically.

Anyway, I think that (1) we should write some more test cases around
this behavior, (2) you need to establish the snapshot to use in two
different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
and (3) something's going to have to be done to repair the behavior
in v14 (unless we want to back-patch this into v14, which seems a
bit scary).

It looks like you've addressed the other complaints I raised back in
March, so that's forward progress anyway. I do still find myself a
bit dissatisfied with the code factorization, because it seems like
find_leaf_pk_rel() doesn't belong here but rather in some partitioning
module. OTOH, if that means exposing RI_ConstraintInfo to the world,
that wouldn't be nice either.

regards, tom lane

#47

Alvaro Herrera

alvherre@2ndquadrant.com

about 4 years ago

In reply to: Tom Lane (#46)

Re: simplifying foreign key/RI checks

On 2021-Nov-11, Tom Lane wrote:

It appears to me that commit 71f4c8c6f (DETACH PARTITION CONCURRENTLY)
broke the semantics here, because now things work differently with a
partitioned PK table than with a plain table, thanks to not bothering
to distinguish questions of how to handle partition detachment from
questions of visibility of individual data tuples. We evidently
haven't got test coverage for this :-(, which is perhaps not so
surprising because all this behavior long predates the isolationtester
infrastructure that would've allowed us to test it mechanically.

Anyway, I think that (1) we should write some more test cases around
this behavior, (2) you need to establish the snapshot to use in two
different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
and (3) something's going to have to be done to repair the behavior
in v14 (unless we want to back-patch this into v14, which seems a
bit scary).

I think we (I) should definitely pursue fixing whatever was broken by
DETACH CONCURRENTLY, back to pg14, independently of this patch ... but
I would appreciate some insight into what the problem is.

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
"Find a bug in a program, and fix it, and the program will work today.
Show the program how to find and fix a bug, and the program
will work forever" (Oliver Silfridge)

#48

Tom Lane

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Alvaro Herrera (#47)

Re: simplifying foreign key/RI checks

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I think we (I) should definitely pursue fixing whatever was broken by
DETACH CONCURRENTLY, back to pg14, independently of this patch ... but
I would appreciate some insight into what the problem is.

Here's what I'm on about:

regression=# create table pk (f1 int primary key);
CREATE TABLE
regression=# insert into pk values(1);
INSERT 0 1
regression=# create table fk (f1 int references pk);
CREATE TABLE
regression=# begin isolation level repeatable read ;
BEGIN
regression=*# select * from pk; -- to establish xact snapshot
f1
----
1
(1 row)

now, in another session, do:

regression=# insert into pk values(2);
INSERT 0 1

back at the RR transaction, we can't see that:

regression=*# select * from pk; -- still no row 2
f1
----
1
(1 row)

so we get:

regression=*# insert into fk values(1);
INSERT 0 1
regression=*# insert into fk values(2);
ERROR: insert or update on table "fk" violates foreign key constraint "fk_f1_fkey"
DETAIL: Key (f1)=(2) is not present in table "pk".

IMO that behavior is correct. If you use READ COMMITTED, then
SELECT can see row 2 as soon as it's committed, and so can the
FK check, and again that's correct.

In v13, the behavior is the same if "pk" is a partitioned table instead
of a plain one. In HEAD, it's not:

regression=# drop table pk, fk;
DROP TABLE
regression=# create table pk (f1 int primary key) partition by list(f1);
CREATE TABLE
regression=# create table pk1 partition of pk for values in (1,2);
CREATE TABLE
regression=# insert into pk values(1);
INSERT 0 1
regression=# create table fk (f1 int references pk);
CREATE TABLE
regression=# begin isolation level repeatable read ;
BEGIN
regression=*# select * from pk; -- to establish xact snapshot
f1
----
1
(1 row)

--- now insert row 2 in another session

regression=*# select * from pk; -- still no row 2
f1
----
1
(1 row)

regression=*# insert into fk values(1);
INSERT 0 1
regression=*# insert into fk values(2);
INSERT 0 1
regression=*#

So I say that's busted, and the cause is this hunk from 71f4c8c6f:

@@ -392,11 +392,15 @@ RI_FKey_check(TriggerData *trigdata)

     /*
      * Now check that foreign key exists in PK table
+     *
+     * XXX detectNewRows must be true when a partitioned table is on the
+     * referenced side.  The reason is that our snapshot must be fresh
+     * in order for the hack in find_inheritance_children() to work.
      */
     ri_PerformCheck(riinfo, &qkey, qplan,
                     fk_rel, pk_rel,
                     NULL, newslot,
-                    false,
+                    pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
                     SPI_OK_SELECT);

if (SPI_finish() != SPI_OK_FINISH)

I think you need some signalling mechanism that's less global than
ActiveSnapshot to tell the partition-lookup machinery what to do
in this context.

regards, tom lane

#49

Tom Lane

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Tom Lane (#46)

Re: simplifying foreign key/RI checks

I wrote:

Anyway, I think that (1) we should write some more test cases around
this behavior, (2) you need to establish the snapshot to use in two
different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
and (3) something's going to have to be done to repair the behavior
in v14 (unless we want to back-patch this into v14, which seems a
bit scary).

I wrote that thinking that point (2), ie fix the choice of snapshots for
these RI queries, would solve the brokenness in partitioned tables,
so that (3) would potentially only require hacking up v14.

However after thinking more I realize that (2) will break the desired
behavior for concurrent partition detaches, because that's being driven
off ActiveSnapshot. So we really need a solution that decouples the
partition detachment logic from ActiveSnapshot, in both branches.

regards, tom lane

#50

Amit Langote

amitlangote09@gmail.com

about 4 years ago

In reply to: Tom Lane (#46)

Re: simplifying foreign key/RI checks

On Fri, Nov 12, 2021 at 8:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

Rebased patches attached.

I've spent some more time digging into the snapshot-management angle.

Thanks for looking at this.

I think you are right that the crosscheck_snapshot isn't really an
issue because the executor pays no attention to it for SELECT, but
that doesn't mean that there's no problem, because the test_snapshot
behavior is different too. By my reading of it, the intention of the
existing code is to insist that when IsolationUsesXactSnapshot()
is true and we *weren't* saying detectNewRows, the query should be
restricted to only see rows visible to the transaction snapshot.
Which I think is proper: an RR transaction shouldn't be allowed to
insert referencing rows that depend on a referenced row it can't see.
On the other hand, it's reasonable for ri_Check_Pk_Match to use
detectNewRows=true, because in that case what we're doing is allowing
an RR transaction to depend on the continued existence of a PK value
that was deleted and replaced since the start of its transaction.

It appears to me that commit 71f4c8c6f (DETACH PARTITION CONCURRENTLY)
broke the semantics here, because now things work differently with a
partitioned PK table than with a plain table, thanks to not bothering
to distinguish questions of how to handle partition detachment from
questions of visibility of individual data tuples. We evidently
haven't got test coverage for this :-(, which is perhaps not so
surprising because all this behavior long predates the isolationtester
infrastructure that would've allowed us to test it mechanically.

Anyway, I think that (1) we should write some more test cases around
this behavior, (2) you need to establish the snapshot to use in two
different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
and (3) something's going to have to be done to repair the behavior
in v14 (unless we want to back-patch this into v14, which seems a
bit scary).

Okay, I'll look into getting 1 and 2 done for this patch and I guess
work with Alvaro on 3.

It looks like you've addressed the other complaints I raised back in
March, so that's forward progress anyway. I do still find myself a
bit dissatisfied with the code factorization, because it seems like
find_leaf_pk_rel() doesn't belong here but rather in some partitioning
module. OTOH, if that means exposing RI_ConstraintInfo to the world,
that wouldn't be nice either.

Hm yeah, fair point about the undesirability of putting partitioning
details into ri_triggers.c, so will look into refactoring to avoid
that.

--
Amit Langote
EDB: http://www.enterprisedb.com

#51

Amit Langote

amitlangote09@gmail.com

about 4 years ago

In reply to: Tom Lane (#49)

Re: simplifying foreign key/RI checks

On Fri, Nov 12, 2021 at 10:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Anyway, I think that (1) we should write some more test cases around
this behavior, (2) you need to establish the snapshot to use in two
different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
and (3) something's going to have to be done to repair the behavior
in v14 (unless we want to back-patch this into v14, which seems a
bit scary).

I wrote that thinking that point (2), ie fix the choice of snapshots for
these RI queries, would solve the brokenness in partitioned tables,
so that (3) would potentially only require hacking up v14.

However after thinking more I realize that (2) will break the desired
behavior for concurrent partition detaches, because that's being driven
off ActiveSnapshot. So we really need a solution that decouples the
partition detachment logic from ActiveSnapshot, in both branches.

ISTM that the latest snapshot would still have to be passed to the
find_inheritance_children_extended() *somehow* by ri_trigger.c. IIUC
the problem with using the ActiveSnapshot mechanism to do that is that
it causes the SPI query to see even user table rows that it shouldn't
be able to, so that is why you say it is too global a mechanism for
this hack.

Whatever mechanism we will use would still need to involve setting a
global Snapshot variable though, right?

--
Amit Langote
EDB: http://www.enterprisedb.com

#52

Tom Lane

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Amit Langote (#51)

Re: simplifying foreign key/RI checks

Amit Langote <amitlangote09@gmail.com> writes:

Whatever mechanism we will use would still need to involve setting a
global Snapshot variable though, right?

In v14 we'll certainly still be passing the snapshot(s) to SPI, which will
eventually make the snapshot active. With your patch, since we're just
handing the snapshot to the scan mechanism, it seems at least
theoretically possible that we'd not have to do PushActiveSnapshot on it.
Not doing so might be a bad idea however; if there is any user-defined
code getting called, it might have expectations about ActiveSnapshot being
relevant. On the whole I'd be inclined to say that we still want the
RI test_snapshot to be the ActiveSnapshot while performing the test.

regards, tom lane

#53

Tom Lane

tgl@sss.pgh.pa.us

about 4 years ago

In reply to: Amit Langote (#50)

Re: simplifying foreign key/RI checks

Amit Langote <amitlangote09@gmail.com> writes:

On Fri, Nov 12, 2021 at 8:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, I think that (1) we should write some more test cases around
this behavior, (2) you need to establish the snapshot to use in two
different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
and (3) something's going to have to be done to repair the behavior
in v14 (unless we want to back-patch this into v14, which seems a
bit scary).

Okay, I'll look into getting 1 and 2 done for this patch and I guess
work with Alvaro on 3.

Actually, it seems that DETACH PARTITION is broken for concurrent
serializable/repeatable-read transactions quite independently of
whether they attempt to make any FK checks [1]/messages/by-id/1849918.1636748862@sss.pgh.pa.us. If we do what
I speculated about there, namely wait out all such xacts before
detaching, it might be possible to fix (3) just by reverting the
problematic change in ri_triggers.c. I'm thinking the wait would
render it unnecessary to get FK checks to do anything weird about
partition lookup. But I might well be missing something.

regards, tom lane

[1]: /messages/by-id/1849918.1636748862@sss.pgh.pa.us

#54

Amit Langote

amitlangote09@gmail.com

about 4 years ago

In reply to: Tom Lane (#53)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Sat, Nov 13, 2021 at 5:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Langote <amitlangote09@gmail.com> writes:

On Fri, Nov 12, 2021 at 8:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, I think that (1) we should write some more test cases around
this behavior, (2) you need to establish the snapshot to use in two
different ways for the RI_FKey_check and ri_Check_Pk_Match cases,
and (3) something's going to have to be done to repair the behavior
in v14 (unless we want to back-patch this into v14, which seems a
bit scary).

Okay, I'll look into getting 1 and 2 done for this patch and I guess
work with Alvaro on 3.

Actually, it seems that DETACH PARTITION is broken for concurrent
serializable/repeatable-read transactions quite independently of
whether they attempt to make any FK checks [1]. If we do what
I speculated about there, namely wait out all such xacts before
detaching, it might be possible to fix (3) just by reverting the
problematic change in ri_triggers.c. I'm thinking the wait would
render it unnecessary to get FK checks to do anything weird about
partition lookup. But I might well be missing something.

I wasn't able to make much inroads into how we might be able to get
rid of the DETACH-related partition descriptor hacks, the item (3),
though I made some progress on items (1) and (2).

For (1), the attached 0001 patch adds a new isolation suite
fk-snapshot.spec to exercise snapshot behaviors in the cases where we
no longer go through SPI. It helped find some problems with the
snapshot handling in the earlier versions of the patch, mainly with
partitioned PK tables. It also contains a test along the lines of the
example you showed upthread, which shows that the partition descriptor
hack requiring ActiveSnapshot to be set results in wrong results.
Patch includes the buggy output for that test case and marked as such
in a comment above the test.

In updated 0002, I fixed things such that the snapshot-setting
required by the partition descriptor hack is independent of
snapshot-setting of the RI query such that it no longer causes the PK
index scan to return rows that the RI query mustn't see. That fixes
the visibility bug illustrated in your example, and as mentioned, also
exercised in the new test suite.

I also moved find_leaf_pk_rel() into execPartition.c with a new name
and a new set of parameters.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v10-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchapplication/octet-stream; name=v10-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchDownload

From de3edc3d4a0be0d60e164d4a67ec1cd1d6731bcf Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 15 Nov 2021 18:22:33 +0900
Subject: [PATCH v10 1/2] Add isolation tests for snapshot behavior in
 ri_triggers.c

They are to check the behavior of RI_FKey_check() and
ri_Check_Pk_Match().  A test case whereby RI_FKey_check() queries a
partitioned PK table under REPEATABLE READ isolation produces wrong
output due to a bug of the partition-descriptor logic and that is
noted as such in the comment above the test.  A subsequent patch
will fix the bug and replace the buggy output by the correct one.
---
 src/test/isolation/expected/fk-snapshot.out | 124 ++++++++++++++++++++
 src/test/isolation/isolation_schedule       |   1 +
 src/test/isolation/specs/fk-snapshot.spec   |  61 ++++++++++
 3 files changed, 186 insertions(+)
 create mode 100644 src/test/isolation/expected/fk-snapshot.out
 create mode 100644 src/test/isolation/specs/fk-snapshot.spec

diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
new file mode 100644
index 0000000000..5faf80d6ce
--- /dev/null
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -0,0 +1,124 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+ERROR:  insert or update on table "fk_parted_pk_2" violates foreign key constraint "fk_parted_pk_a_fkey"
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s2brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+2
+(2 rows)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brr s1dfp s1ifp1 s1c s1sfn
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s1brc s1dfp s1ifp1 s1c s1sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index f4c01006fc..6507afb49c 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -33,6 +33,7 @@ test: fk-deadlock
 test: fk-deadlock2
 test: fk-partitioned-1
 test: fk-partitioned-2
+test: fk-snapshot
 test: eval-plan-qual
 test: eval-plan-qual-trigger
 test: lock-update-delete
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
new file mode 100644
index 0000000000..378507fbc3
--- /dev/null
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -0,0 +1,61 @@
+setup
+{
+  CREATE TABLE pk_noparted (
+	a			int		PRIMARY KEY
+  );
+
+  CREATE TABLE fk_parted_pk (
+	a			int		PRIMARY KEY REFERENCES pk_noparted ON DELETE CASCADE
+  ) PARTITION BY LIST (a);
+  CREATE TABLE fk_parted_pk_1 PARTITION OF fk_parted_pk FOR VALUES IN (1);
+  CREATE TABLE fk_parted_pk_2 PARTITION OF fk_parted_pk FOR VALUES IN (2);
+
+  CREATE TABLE fk_noparted (
+	a			int		REFERENCES fk_parted_pk ON DELETE NO ACTION INITIALLY DEFERRED
+  );
+  INSERT INTO pk_noparted VALUES (1);
+  INSERT INTO fk_parted_pk VALUES (1);
+  INSERT INTO fk_noparted VALUES (1);
+}
+
+teardown
+{
+  DROP TABLE pk_noparted, fk_parted_pk, fk_noparted;
+}
+
+session s1
+step s1brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s1brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s1ifp2	{ INSERT INTO fk_parted_pk VALUES (2); }
+step s1ifp1	{ INSERT INTO fk_parted_pk VALUES (1); }
+step s1dfp	{ DELETE FROM fk_parted_pk WHERE a = 1; }
+step s1c	{ COMMIT; }
+step s1sfp	{ SELECT * FROM fk_parted_pk; }
+step s1sp	{ SELECT * FROM pk_noparted; }
+step s1sfn	{ SELECT * FROM fk_noparted; }
+
+session s2
+step s2brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s2brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s2ip2	{ INSERT INTO pk_noparted VALUES (2); }
+step s2ifn2	{ INSERT INTO fk_noparted VALUES (2); }
+step s2c	{ COMMIT; }
+step s2sfp	{ SELECT * FROM fk_parted_pk; }
+step s2sfn	{ SELECT * FROM fk_noparted; }
+
+# inserting into referencing tables in transaction-snapshot mode
+# PK table is non-partitioned
+permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+# PK table is partitioned: buggy, because s2's serialization transaction can
+# see the uncommitted row thanks to the latest snapshot taken for
+# partition lookup to work correctly also ends up getting used by the PK index
+# scan
+permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+
+# inserting into referencing tables in up-to-date snapshot mode
+permutation s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+
+# deleting a referenced row and then inserting again in the same transaction; works
+# the same no matter the snapshot mode
+permutation s1brr s1dfp s1ifp1 s1c s1sfn
+permutation s1brc s1dfp s1ifp1 s1c s1sfn
-- 
2.24.1

v10-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v10-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From 2cb4de661dc1c5a47b1ca7f4e52a29317f267bcd Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v10 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.

This rewrite allows to fix a PK row visibility bug caused by a
partition descriptor hack which requires ActiveSnapshot to be set to
come up with the correct set of partitions for the RI query running
under REPEATABLE READ isolation.  We now set that snapshot
indepedently of the snapshot to be used by the PK index scan, so the
two no longer interfere.  The buggy output in
src/test/isolation/expected/fk-snapshot.out of the relevant test
case has been corrected.
---
 src/backend/executor/execPartition.c        | 159 +++++-
 src/backend/executor/nodeLockRows.c         | 161 +++---
 src/backend/utils/adt/ri_triggers.c         | 534 +++++++++++---------
 src/include/executor/execPartition.h        |   5 +
 src/include/executor/executor.h             |   9 +
 src/test/isolation/expected/fk-snapshot.out |   4 +-
 src/test/isolation/specs/fk-snapshot.spec   |   5 +-
 7 files changed, 562 insertions(+), 315 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 5c723bc54e..9c1d7c2dbf 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -175,8 +175,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -310,7 +311,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1240,12 +1243,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
@@ -1337,6 +1340,150 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * find_leaf_part_for_key
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified key tuple, which contains a subset of the
+ *		table's columns including all of the partition key columns.
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Returns NULL if no leaf partition is found for the key.  Caller must close
+ * the relation.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+find_leaf_part_for_key(Relation root_rel, int key_natts,
+					   const AttrNumber *key_attnums,
+					   Datum *key_vals, char *key_nulls,
+					   Oid root_idxoid, Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+		partoid = partdesc->oids[partidx];
+		is_leaf = partdesc->is_leaf[partidx];
+
+		/* done using the partition directory */
+		DestroyPartitionDirectory(partdir);
+
+		/* close any intermediate parents we opened */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		rel = table_open(partoid, RowShareLock);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 7583973f4a..77ac776e27 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,91 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = 0;
+	TM_Result	test;
+
+	lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 96269fc2ad..85a410bed8 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,16 +74,11 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
-#define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
-/* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_DEL_DODELETE	3
-#define RI_PLAN_CASCADE_UPD_DOUPDATE	4
-#define RI_PLAN_RESTRICT_CHECKREF		5
-#define RI_PLAN_SETNULL_DOUPDATE		6
-#define RI_PLAN_SETDEFAULT_DOUPDATE		7
+#define RI_PLAN_CASCADE_DEL_DODELETE	1
+#define RI_PLAN_CASCADE_UPD_DOUPDATE	2
+#define RI_PLAN_RESTRICT_CHECKREF		3
+#define RI_PLAN_SETNULL_DOUPDATE		4
+#define RI_PLAN_SETDEFAULT_DOUPDATE		5
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -224,8 +225,245 @@ static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-							   int queryno, bool partgone) pg_attribute_noreturn();
+							   bool on_fk, bool partgone) pg_attribute_noreturn();
 
+/*
+ * Checks whether a tuple containing the unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found = false;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	Snapshot	snap = InvalidSnapshot;
+	bool		pushed_latest_snapshot = false;
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	Oid			save_userid;
+	int			save_sec_context;
+	AclResult	aclresult;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Switch to referenced table's owner to perform the below operations
+	 * as.  This matches what ri_PerformCheck() does.
+	 *
+	 * Note that as with queries done by ri_PerformCheck(), the way we select
+	 * the referenced row below effectively bypasses any RLS policies that may
+	 * be present on the referenced table.
+	 */
+	GetUserIdAndSecContext(&save_userid, &save_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE);
+
+	/*
+	 * Also check that the new user has permissions to look into the schema
+	 * of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * In the case of scanning the PK index for ri_Check_Pk_Match(), we'd like
+	 * to see all rows that could be interesting, even those that would not be
+	 * visible to the transaction snapshot.  To do so, force-push the latest
+	 * snapshot.
+	 *
+	 * Also, increment the command counter to make the changes of the current
+	 * command visible in all cases.
+	 */
+	CommandCounterIncrement();
+	if (fk_rel == NULL)
+	{
+		snap = GetLatestSnapshot();
+		PushActiveSnapshot(snap);
+		pushed_latest_snapshot = true;
+	}
+	else
+	{
+		snap = GetTransactionSnapshot();
+		PushActiveSnapshot(snap);
+	}
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_constraint_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+		Snapshot mysnap = InvalidSnapshot;
+
+		/*
+		 * XXX the partition descriptor machinery has a hack that assumes that
+		 * the queries originating in this module push the latest snapshot in
+		 * the transaction-snapshot mode.  If we didn't push one already, do
+		 * so here.
+		 */
+		if (!pushed_latest_snapshot)
+		{
+			mysnap = GetLatestSnapshot();
+			PushActiveSnapshot(mysnap);
+		}
+
+		leaf_pk_rel = find_leaf_part_for_key(pk_rel, riinfo->nkeys,
+											 riinfo->pk_attnums,
+											 pk_vals, pk_nulls,
+											 idxoid, &leaf_idxoid);
+		/*
+		 * XXX done fiddling with the partition descriptor machinery so unset
+		 * the active snapshot if we must.
+		 */
+		if (mysnap != InvalidSnapshot)
+			PopActiveSnapshot();
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+		{
+			SetUserIdAndSecContext(save_userid, save_sec_context);
+			PopActiveSnapshot();
+			return false;
+		}
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, snap, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		found = ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+								   snap,
+								   GetCurrentCommandId(false),
+								   LockTupleKeyShare,
+								   LockWaitBlock, NULL);
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/* Restore UID and security context */
+	SetUserIdAndSecContext(save_userid, save_sec_context);
+
+	PopActiveSnapshot();
+
+	return found;
+}
 
 /*
  * RI_FKey_check -
@@ -239,8 +477,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -320,9 +556,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -337,74 +573,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   true, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -459,81 +633,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -1549,15 +1652,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 					 errtableconstraint(fk_rel,
 										NameStr(fake_riinfo.conname))));
 
-		/*
-		 * We tell ri_ReportViolation we were doing the RI_PLAN_CHECK_LOOKUPPK
-		 * query, which isn't true, but will cause it to use
-		 * fake_riinfo.fk_attnums as we need.
-		 */
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   true, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1774,7 +1872,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 			fake_riinfo.pk_attnums[i] = i + 1;
 
 		ri_ReportViolation(&fake_riinfo, pk_rel, fk_rel,
-						   slot, tupdesc, 0, true);
+						   slot, tupdesc, true, true);
 	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
@@ -1911,26 +2009,25 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 {
 	/*
 	 * Inherited constraints with a common ancestor can share ri_query_cache
-	 * entries for all query types except RI_PLAN_CHECK_LOOKUPPK_FROM_PK.
-	 * Except in that case, the query processes the other table involved in
-	 * the FK constraint (i.e., not the table on which the trigger has been
-	 * fired), and so it will be the same for all members of the inheritance
-	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * entries, because each query processes the other table involved in the
+	 * FK constraint (i.e., not the table on which the trigger has been fired),
+	 * and so it will be the same for all members of the inheritance tree.  So
+	 * we may use the root constraint's OID in the hash key, rather than the
+	 * constraint's own OID.  This avoids creating duplicate SPI plans, saving
+	 * lots of work and memory when there are many partitions with similar FK
+	 * constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
 	 *
+	 * (Note also that for a standalone or non-inherited constraint,
+	 * constraint_root_id is same as constraint_id.)
+	 *
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
-		key->constr_id = riinfo->constraint_root_id;
-	else
-		key->constr_id = riinfo->constraint_id;
+	key->constr_id = riinfo->constraint_root_id;
 	key->constr_queryno = constr_queryno;
 }
 
@@ -2199,19 +2296,11 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
-	Relation	query_rel;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel;
 	Oid			save_userid;
 	int			save_sec_context;
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
@@ -2244,9 +2333,10 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2256,33 +2346,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2359,14 +2422,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
-						   qkey->constr_queryno, false);
+						   false, false);
 
 	return SPI_processed != 0;
 }
@@ -2397,9 +2458,9 @@ ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 /*
  * Produce an error report
  *
- * If the failed constraint was on insert/update to the FK table,
- * we want the key names and values extracted from there, and the error
- * message to look like 'key blah is not present in PK'.
+ * If the failed constraint was on insert/update to the FK table (on_fk is
+ * true), we want the key names and values extracted from there, and the
+ * error message to look like 'key blah is not present in PK'.
  * Otherwise, the attr names and values come from the PK table and the
  * message looks like 'key blah is still referenced from FK'.
  */
@@ -2407,22 +2468,20 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 				   Relation pk_rel, Relation fk_rel,
 				   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-				   int queryno, bool partgone)
+				   bool on_fk, bool partgone)
 {
 	StringInfoData key_names;
 	StringInfoData key_values;
-	bool		onfk;
 	const int16 *attnums;
 	Oid			rel_oid;
 	AclResult	aclresult;
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * If tupdesc wasn't passed by caller, assume the violator tuple came from
+	 * there.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
-	if (onfk)
+	if (on_fk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
@@ -2524,7 +2583,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 						   key_names.data, key_values.data,
 						   RelationGetRelationName(fk_rel)),
 				 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
-	else if (onfk)
+	else if (on_fk)
 		ereport(ERROR,
 				(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 				 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -2831,7 +2890,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2887,8 +2949,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 694e38b7dd..e10f0d72db 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,5 +124,10 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern Relation find_leaf_part_for_key(Relation root_rel,
+					  int key_natts,
+					  const AttrNumber *key_attnums,
+					  Datum *key_vals, char *key_nulls,
+					  Oid root_idxoid, Oid *leaf_idxoid);
 
 #endif							/* EXECPARTITION_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index cd57a704ad..51c9e86106 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -241,6 +241,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.24.1

#55

Corey Huinker

corey.huinker@gmail.com

about 4 years ago

In reply to: Amit Langote (#54)

Re: simplifying foreign key/RI checks

I wasn't able to make much inroads into how we might be able to get
rid of the DETACH-related partition descriptor hacks, the item (3),
though I made some progress on items (1) and (2).

For (1), the attached 0001 patch adds a new isolation suite
fk-snapshot.spec to exercise snapshot behaviors in the cases where we
no longer go through SPI. It helped find some problems with the
snapshot handling in the earlier versions of the patch, mainly with
partitioned PK tables. It also contains a test along the lines of the
example you showed upthread, which shows that the partition descriptor
hack requiring ActiveSnapshot to be set results in wrong results.
Patch includes the buggy output for that test case and marked as such
in a comment above the test.

In updated 0002, I fixed things such that the snapshot-setting
required by the partition descriptor hack is independent of
snapshot-setting of the RI query such that it no longer causes the PK
index scan to return rows that the RI query mustn't see. That fixes
the visibility bug illustrated in your example, and as mentioned, also
exercised in the new test suite.

I also moved find_leaf_pk_rel() into execPartition.c with a new name
and a new set of parameters.

--
Amit Langote
EDB: http://www.enterprisedb.com

Sorry for the delay. This patch no longer applies, it has some conflict
with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a

#56

Amit Langote

amitlangote09@gmail.com

about 4 years ago

In reply to: Corey Huinker (#55)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Mon, Dec 20, 2021 at 2:00 PM Corey Huinker <corey.huinker@gmail.com> wrote:

Sorry for the delay. This patch no longer applies, it has some conflict with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a

Thanks Corey for the heads up. Rebased with some cosmetic adjustments.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v11-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchapplication/octet-stream; name=v11-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchDownload

From a49015bdce4ad50b2953c3b253130568408afe6e Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 15 Nov 2021 18:22:33 +0900
Subject: [PATCH v11 1/2] Add isolation tests for snapshot behavior in
 ri_triggers.c

They are to check the behavior of RI_FKey_check() and
ri_Check_Pk_Match().  A test case whereby RI_FKey_check() queries a
partitioned PK table under REPEATABLE READ isolation produces wrong
output due to a bug of the partition-descriptor logic and that is
noted as such in the comment above the test.  A subsequent patch
will fix the bug and replace the buggy output by the correct one.
---
 src/test/isolation/expected/fk-snapshot.out | 124 ++++++++++++++++++++
 src/test/isolation/isolation_schedule       |   1 +
 src/test/isolation/specs/fk-snapshot.spec   |  61 ++++++++++
 3 files changed, 186 insertions(+)
 create mode 100644 src/test/isolation/expected/fk-snapshot.out
 create mode 100644 src/test/isolation/specs/fk-snapshot.spec

diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
new file mode 100644
index 0000000000..5faf80d6ce
--- /dev/null
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -0,0 +1,124 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+ERROR:  insert or update on table "fk_parted_pk_2" violates foreign key constraint "fk_parted_pk_a_fkey"
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s2brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+2
+(2 rows)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brr s1dfp s1ifp1 s1c s1sfn
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s1brc s1dfp s1ifp1 s1c s1sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 99c23b16ff..90f29fd278 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -33,6 +33,7 @@ test: fk-deadlock
 test: fk-deadlock2
 test: fk-partitioned-1
 test: fk-partitioned-2
+test: fk-snapshot
 test: eval-plan-qual
 test: eval-plan-qual-trigger
 test: lock-update-delete
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
new file mode 100644
index 0000000000..378507fbc3
--- /dev/null
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -0,0 +1,61 @@
+setup
+{
+  CREATE TABLE pk_noparted (
+	a			int		PRIMARY KEY
+  );
+
+  CREATE TABLE fk_parted_pk (
+	a			int		PRIMARY KEY REFERENCES pk_noparted ON DELETE CASCADE
+  ) PARTITION BY LIST (a);
+  CREATE TABLE fk_parted_pk_1 PARTITION OF fk_parted_pk FOR VALUES IN (1);
+  CREATE TABLE fk_parted_pk_2 PARTITION OF fk_parted_pk FOR VALUES IN (2);
+
+  CREATE TABLE fk_noparted (
+	a			int		REFERENCES fk_parted_pk ON DELETE NO ACTION INITIALLY DEFERRED
+  );
+  INSERT INTO pk_noparted VALUES (1);
+  INSERT INTO fk_parted_pk VALUES (1);
+  INSERT INTO fk_noparted VALUES (1);
+}
+
+teardown
+{
+  DROP TABLE pk_noparted, fk_parted_pk, fk_noparted;
+}
+
+session s1
+step s1brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s1brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s1ifp2	{ INSERT INTO fk_parted_pk VALUES (2); }
+step s1ifp1	{ INSERT INTO fk_parted_pk VALUES (1); }
+step s1dfp	{ DELETE FROM fk_parted_pk WHERE a = 1; }
+step s1c	{ COMMIT; }
+step s1sfp	{ SELECT * FROM fk_parted_pk; }
+step s1sp	{ SELECT * FROM pk_noparted; }
+step s1sfn	{ SELECT * FROM fk_noparted; }
+
+session s2
+step s2brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s2brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s2ip2	{ INSERT INTO pk_noparted VALUES (2); }
+step s2ifn2	{ INSERT INTO fk_noparted VALUES (2); }
+step s2c	{ COMMIT; }
+step s2sfp	{ SELECT * FROM fk_parted_pk; }
+step s2sfn	{ SELECT * FROM fk_noparted; }
+
+# inserting into referencing tables in transaction-snapshot mode
+# PK table is non-partitioned
+permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+# PK table is partitioned: buggy, because s2's serialization transaction can
+# see the uncommitted row thanks to the latest snapshot taken for
+# partition lookup to work correctly also ends up getting used by the PK index
+# scan
+permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+
+# inserting into referencing tables in up-to-date snapshot mode
+permutation s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+
+# deleting a referenced row and then inserting again in the same transaction; works
+# the same no matter the snapshot mode
+permutation s1brr s1dfp s1ifp1 s1c s1sfn
+permutation s1brc s1dfp s1ifp1 s1c s1sfn
-- 
2.24.1

v11-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v11-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From 1e6e2973a738ca2de8c95978a84932b936215450 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v11 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.

This rewrite allows to fix a PK row visibility bug caused by a
partition descriptor hack which requires ActiveSnapshot to be set to
come up with the correct set of partitions for the RI query running
under REPEATABLE READ isolation.  We now set that snapshot
indepedently of the snapshot to be used by the PK index scan, so the
two no longer interfere.  The buggy output in
src/test/isolation/expected/fk-snapshot.out of the relevant test
case has been corrected.
---
 src/backend/executor/execPartition.c        | 160 +++++-
 src/backend/executor/nodeLockRows.c         | 161 +++---
 src/backend/utils/adt/ri_triggers.c         | 538 +++++++++++---------
 src/include/executor/execPartition.h        |   6 +
 src/include/executor/executor.h             |   9 +
 src/test/isolation/expected/fk-snapshot.out |   4 +-
 src/test/isolation/specs/fk-snapshot.spec   |   5 +-
 7 files changed, 567 insertions(+), 316 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 5c723bc54e..4b90290f11 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -175,8 +175,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -310,7 +311,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1240,12 +1243,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
@@ -1337,6 +1340,151 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * find_leaf_part_for_key
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified key tuple containing a subset of the table's
+ *		columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Returns NULL if no leaf partition is found for the key.  Caller must close
+ * the relation.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+find_leaf_part_for_key(Relation root_rel, int key_natts,
+					   const AttrNumber *key_attnums,
+					   Datum *key_vals, char *key_nulls,
+					   Oid root_idxoid, int lockmode,
+					   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+		partoid = partdesc->oids[partidx];
+		is_leaf = partdesc->is_leaf[partidx];
+
+		/* done using the partition directory */
+		DestroyPartitionDirectory(partdir);
+
+		/* close any intermediate parents we opened */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 7583973f4a..77ac776e27 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,91 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = 0;
+	TM_Result	test;
+
+	lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 8ebb2a50a1..6a06e43114 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,19 +74,15 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
-#define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
 /* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_ONDELETE		3
-#define RI_PLAN_CASCADE_ONUPDATE		4
+#define RI_PLAN_CASCADE_ONDELETE		1
+#define RI_PLAN_CASCADE_ONUPDATE		2
 /* For RESTRICT, the same plan can be used for both ON DELETE and ON UPDATE triggers. */
-#define RI_PLAN_RESTRICT				5
-#define RI_PLAN_SETNULL_ONDELETE		6
-#define RI_PLAN_SETNULL_ONUPDATE		7
-#define RI_PLAN_SETDEFAULT_ONDELETE		8
-#define RI_PLAN_SETDEFAULT_ONUPDATE		9
+#define RI_PLAN_RESTRICT				3
+#define RI_PLAN_SETNULL_ONDELETE		4
+#define RI_PLAN_SETNULL_ONUPDATE		5
+#define RI_PLAN_SETDEFAULT_ONDELETE		6
+#define RI_PLAN_SETDEFAULT_ONUPDATE		7
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -229,8 +231,246 @@ static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-							   int queryno, bool partgone) pg_attribute_noreturn();
+							   bool on_fk, bool partgone) pg_attribute_noreturn();
 
+/*
+ * Checks whether a tuple containing the unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found = false;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	Snapshot	snap = InvalidSnapshot;
+	bool		pushed_latest_snapshot = false;
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	Oid			save_userid;
+	int			save_sec_context;
+	AclResult	aclresult;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Switch to referenced table's owner to perform the below operations
+	 * as.  This matches what ri_PerformCheck() does.
+	 *
+	 * Note that as with queries done by ri_PerformCheck(), the way we select
+	 * the referenced row below effectively bypasses any RLS policies that may
+	 * be present on the referenced table.
+	 */
+	GetUserIdAndSecContext(&save_userid, &save_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE);
+
+	/*
+	 * Also check that the new user has permissions to look into the schema
+	 * of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * In the case of scanning the PK index for ri_Check_Pk_Match(), we'd like
+	 * to see all rows that could be interesting, even those that would not be
+	 * visible to the transaction snapshot.  To do so, force-push the latest
+	 * snapshot.
+	 *
+	 * Also, increment the command counter to make the changes of the current
+	 * command visible in all cases.
+	 */
+	CommandCounterIncrement();
+	if (fk_rel == NULL)
+	{
+		snap = GetLatestSnapshot();
+		PushActiveSnapshot(snap);
+		pushed_latest_snapshot = true;
+	}
+	else
+	{
+		snap = GetTransactionSnapshot();
+		PushActiveSnapshot(snap);
+	}
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_constraint_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+		Snapshot mysnap = InvalidSnapshot;
+
+		/*
+		 * XXX the partition descriptor machinery has a hack that assumes that
+		 * the queries originating in this module push the latest snapshot in
+		 * the transaction-snapshot mode.  If we didn't push one already, do
+		 * so here.
+		 */
+		if (!pushed_latest_snapshot)
+		{
+			mysnap = GetLatestSnapshot();
+			PushActiveSnapshot(mysnap);
+		}
+
+		leaf_pk_rel = find_leaf_part_for_key(pk_rel, riinfo->nkeys,
+											 riinfo->pk_attnums,
+											 pk_vals, pk_nulls,
+											 idxoid, RowShareLock,
+											 &leaf_idxoid);
+		/*
+		 * XXX done fiddling with the partition descriptor machinery so unset
+		 * the active snapshot if we must.
+		 */
+		if (mysnap != InvalidSnapshot)
+			PopActiveSnapshot();
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+		{
+			SetUserIdAndSecContext(save_userid, save_sec_context);
+			PopActiveSnapshot();
+			return false;
+		}
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, snap, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		found = ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+								   snap,
+								   GetCurrentCommandId(false),
+								   LockTupleKeyShare,
+								   LockWaitBlock, NULL);
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/* Restore UID and security context */
+	SetUserIdAndSecContext(save_userid, save_sec_context);
+
+	PopActiveSnapshot();
+
+	return found;
+}
 
 /*
  * RI_FKey_check -
@@ -244,8 +484,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -325,9 +563,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -342,74 +580,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   true, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -464,81 +640,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -1602,15 +1707,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 					 errtableconstraint(fk_rel,
 										NameStr(fake_riinfo.conname))));
 
-		/*
-		 * We tell ri_ReportViolation we were doing the RI_PLAN_CHECK_LOOKUPPK
-		 * query, which isn't true, but will cause it to use
-		 * fake_riinfo.fk_attnums as we need.
-		 */
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   true, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1827,7 +1927,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 			fake_riinfo.pk_attnums[i] = i + 1;
 
 		ri_ReportViolation(&fake_riinfo, pk_rel, fk_rel,
-						   slot, tupdesc, 0, true);
+						   slot, tupdesc, true, true);
 	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
@@ -1964,26 +2064,25 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 {
 	/*
 	 * Inherited constraints with a common ancestor can share ri_query_cache
-	 * entries for all query types except RI_PLAN_CHECK_LOOKUPPK_FROM_PK.
-	 * Except in that case, the query processes the other table involved in
-	 * the FK constraint (i.e., not the table on which the trigger has been
-	 * fired), and so it will be the same for all members of the inheritance
-	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * entries, because each query processes the other table involved in the
+	 * FK constraint (i.e., not the table on which the trigger has been fired),
+	 * and so it will be the same for all members of the inheritance tree.  So
+	 * we may use the root constraint's OID in the hash key, rather than the
+	 * constraint's own OID.  This avoids creating duplicate SPI plans, saving
+	 * lots of work and memory when there are many partitions with similar FK
+	 * constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
 	 *
+	 * (Note also that for a standalone or non-inherited constraint,
+	 * constraint_root_id is same as constraint_id.)
+	 *
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
-		key->constr_id = riinfo->constraint_root_id;
-	else
-		key->constr_id = riinfo->constraint_id;
+	key->constr_id = riinfo->constraint_root_id;
 	key->constr_queryno = constr_queryno;
 }
 
@@ -2254,19 +2353,11 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
-	Relation	query_rel;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel;
 	Oid			save_userid;
 	int			save_sec_context;
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
@@ -2299,9 +2390,10 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2311,33 +2403,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2414,14 +2479,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
-						   qkey->constr_queryno, false);
+						   false, false);
 
 	return SPI_processed != 0;
 }
@@ -2452,9 +2515,9 @@ ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 /*
  * Produce an error report
  *
- * If the failed constraint was on insert/update to the FK table,
- * we want the key names and values extracted from there, and the error
- * message to look like 'key blah is not present in PK'.
+ * If the failed constraint was on insert/update to the FK table (on_fk is
+ * true), we want the key names and values extracted from there, and the
+ * error message to look like 'key blah is not present in PK'.
  * Otherwise, the attr names and values come from the PK table and the
  * message looks like 'key blah is still referenced from FK'.
  */
@@ -2462,22 +2525,20 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 				   Relation pk_rel, Relation fk_rel,
 				   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-				   int queryno, bool partgone)
+				   bool on_fk, bool partgone)
 {
 	StringInfoData key_names;
 	StringInfoData key_values;
-	bool		onfk;
 	const int16 *attnums;
 	Oid			rel_oid;
 	AclResult	aclresult;
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * If tupdesc wasn't passed by caller, assume the violator tuple came from
+	 * there.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
-	if (onfk)
+	if (on_fk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
@@ -2579,7 +2640,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 						   key_names.data, key_values.data,
 						   RelationGetRelationName(fk_rel)),
 				 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
-	else if (onfk)
+	else if (on_fk)
 		ereport(ERROR,
 				(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 				 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -2886,7 +2947,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2942,8 +3006,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 694e38b7dd..cfdd73f95d 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,5 +124,11 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern Relation find_leaf_part_for_key(Relation root_rel,
+					  int key_natts,
+					  const AttrNumber *key_attnums,
+					  Datum *key_vals, char *key_nulls,
+					  Oid root_idxoid, int lockmode,
+					  Oid *leaf_idxoid);
 
 #endif							/* EXECPARTITION_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index cd57a704ad..51c9e86106 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -241,6 +241,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.24.1

#57

Zhihong Yu

zyu@yugabyte.com

about 4 years ago

In reply to: Amit Langote (#56)

Re: simplifying foreign key/RI checks

On Sun, Dec 19, 2021 at 10:20 PM Amit Langote <amitlangote09@gmail.com>
wrote:

On Mon, Dec 20, 2021 at 2:00 PM Corey Huinker <corey.huinker@gmail.com>
wrote:

Sorry for the delay. This patch no longer applies, it has some conflict

with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a

Thanks Corey for the heads up. Rebased with some cosmetic adjustments.

Hi,

+       Assert(partidx < 0 || partidx < partdesc->nparts);
+       partoid = partdesc->oids[partidx];

If partidx < 0, do we still need to fill out partoid and is_leaf ? It seems
we can return early based on (should call table_close(rel) first):

+       /* No partition found. */
+       if (partidx < 0)
+           return NULL;

Cheers

#58

Amit Langote

amitlangote09@gmail.com

about 4 years ago

In reply to: Zhihong Yu (#57)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Mon, Dec 20, 2021 at 6:19 PM Zhihong Yu <zyu@yugabyte.com> wrote:

On Sun, Dec 19, 2021 at 10:20 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Dec 20, 2021 at 2:00 PM Corey Huinker <corey.huinker@gmail.com> wrote:

Sorry for the delay. This patch no longer applies, it has some conflict with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a

Thanks Corey for the heads up. Rebased with some cosmetic adjustments.

Hi,
+       Assert(partidx < 0 || partidx < partdesc->nparts);
+       partoid = partdesc->oids[partidx];
If partidx < 0, do we still need to fill out partoid and is_leaf ? It seems we can return early based on (should call table_close(rel) first):
+       /* No partition found. */
+       if (partidx < 0)
+           return NULL;

Good catch, thanks. Patch updated.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v12-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchapplication/octet-stream; name=v12-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchDownload

From a49015bdce4ad50b2953c3b253130568408afe6e Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 15 Nov 2021 18:22:33 +0900
Subject: [PATCH v12 1/2] Add isolation tests for snapshot behavior in
 ri_triggers.c

They are to check the behavior of RI_FKey_check() and
ri_Check_Pk_Match().  A test case whereby RI_FKey_check() queries a
partitioned PK table under REPEATABLE READ isolation produces wrong
output due to a bug of the partition-descriptor logic and that is
noted as such in the comment above the test.  A subsequent patch
will fix the bug and replace the buggy output by the correct one.
---
 src/test/isolation/expected/fk-snapshot.out | 124 ++++++++++++++++++++
 src/test/isolation/isolation_schedule       |   1 +
 src/test/isolation/specs/fk-snapshot.spec   |  61 ++++++++++
 3 files changed, 186 insertions(+)
 create mode 100644 src/test/isolation/expected/fk-snapshot.out
 create mode 100644 src/test/isolation/specs/fk-snapshot.spec

diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
new file mode 100644
index 0000000000..5faf80d6ce
--- /dev/null
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -0,0 +1,124 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+ERROR:  insert or update on table "fk_parted_pk_2" violates foreign key constraint "fk_parted_pk_a_fkey"
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s2brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+2
+(2 rows)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brr s1dfp s1ifp1 s1c s1sfn
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s1brc s1dfp s1ifp1 s1c s1sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 99c23b16ff..90f29fd278 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -33,6 +33,7 @@ test: fk-deadlock
 test: fk-deadlock2
 test: fk-partitioned-1
 test: fk-partitioned-2
+test: fk-snapshot
 test: eval-plan-qual
 test: eval-plan-qual-trigger
 test: lock-update-delete
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
new file mode 100644
index 0000000000..378507fbc3
--- /dev/null
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -0,0 +1,61 @@
+setup
+{
+  CREATE TABLE pk_noparted (
+	a			int		PRIMARY KEY
+  );
+
+  CREATE TABLE fk_parted_pk (
+	a			int		PRIMARY KEY REFERENCES pk_noparted ON DELETE CASCADE
+  ) PARTITION BY LIST (a);
+  CREATE TABLE fk_parted_pk_1 PARTITION OF fk_parted_pk FOR VALUES IN (1);
+  CREATE TABLE fk_parted_pk_2 PARTITION OF fk_parted_pk FOR VALUES IN (2);
+
+  CREATE TABLE fk_noparted (
+	a			int		REFERENCES fk_parted_pk ON DELETE NO ACTION INITIALLY DEFERRED
+  );
+  INSERT INTO pk_noparted VALUES (1);
+  INSERT INTO fk_parted_pk VALUES (1);
+  INSERT INTO fk_noparted VALUES (1);
+}
+
+teardown
+{
+  DROP TABLE pk_noparted, fk_parted_pk, fk_noparted;
+}
+
+session s1
+step s1brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s1brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s1ifp2	{ INSERT INTO fk_parted_pk VALUES (2); }
+step s1ifp1	{ INSERT INTO fk_parted_pk VALUES (1); }
+step s1dfp	{ DELETE FROM fk_parted_pk WHERE a = 1; }
+step s1c	{ COMMIT; }
+step s1sfp	{ SELECT * FROM fk_parted_pk; }
+step s1sp	{ SELECT * FROM pk_noparted; }
+step s1sfn	{ SELECT * FROM fk_noparted; }
+
+session s2
+step s2brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s2brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s2ip2	{ INSERT INTO pk_noparted VALUES (2); }
+step s2ifn2	{ INSERT INTO fk_noparted VALUES (2); }
+step s2c	{ COMMIT; }
+step s2sfp	{ SELECT * FROM fk_parted_pk; }
+step s2sfn	{ SELECT * FROM fk_noparted; }
+
+# inserting into referencing tables in transaction-snapshot mode
+# PK table is non-partitioned
+permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+# PK table is partitioned: buggy, because s2's serialization transaction can
+# see the uncommitted row thanks to the latest snapshot taken for
+# partition lookup to work correctly also ends up getting used by the PK index
+# scan
+permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+
+# inserting into referencing tables in up-to-date snapshot mode
+permutation s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+
+# deleting a referenced row and then inserting again in the same transaction; works
+# the same no matter the snapshot mode
+permutation s1brr s1dfp s1ifp1 s1c s1sfn
+permutation s1brc s1dfp s1ifp1 s1c s1sfn
-- 
2.24.1

v12-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v12-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From a9651720ddd82dd659a50c70dfa04278960d1ffc Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v12 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.

This rewrite allows to fix a PK row visibility bug caused by a
partition descriptor hack which requires ActiveSnapshot to be set to
come up with the correct set of partitions for the RI query running
under REPEATABLE READ isolation.  We now set that snapshot
indepedently of the snapshot to be used by the PK index scan, so the
two no longer interfere.  The buggy output in
src/test/isolation/expected/fk-snapshot.out of the relevant test
case has been corrected.
---
 src/backend/executor/execPartition.c        | 160 +++++-
 src/backend/executor/nodeLockRows.c         | 161 +++---
 src/backend/utils/adt/ri_triggers.c         | 538 +++++++++++---------
 src/include/executor/execPartition.h        |   6 +
 src/include/executor/executor.h             |   9 +
 src/test/isolation/expected/fk-snapshot.out |   4 +-
 src/test/isolation/specs/fk-snapshot.spec   |   5 +-
 7 files changed, 567 insertions(+), 316 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 5c723bc54e..6549a85746 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -175,8 +175,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -310,7 +311,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1240,12 +1243,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
@@ -1337,6 +1340,151 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * find_leaf_part_for_key
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified key tuple containing a subset of the table's
+ *		columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Returns NULL if no leaf partition is found for the key.  Caller must close
+ * the relation.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+find_leaf_part_for_key(Relation root_rel, int key_natts,
+					   const AttrNumber *key_attnums,
+					   Datum *key_vals, char *key_nulls,
+					   Oid root_idxoid, int lockmode,
+					   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* done using the partition directory */
+		DestroyPartitionDirectory(partdir);
+
+		/* close any intermediate parents we opened */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 7583973f4a..77ac776e27 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,91 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = 0;
+	TM_Result	test;
+
+	lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 8ebb2a50a1..6a06e43114 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,19 +74,15 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
-#define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
 /* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_ONDELETE		3
-#define RI_PLAN_CASCADE_ONUPDATE		4
+#define RI_PLAN_CASCADE_ONDELETE		1
+#define RI_PLAN_CASCADE_ONUPDATE		2
 /* For RESTRICT, the same plan can be used for both ON DELETE and ON UPDATE triggers. */
-#define RI_PLAN_RESTRICT				5
-#define RI_PLAN_SETNULL_ONDELETE		6
-#define RI_PLAN_SETNULL_ONUPDATE		7
-#define RI_PLAN_SETDEFAULT_ONDELETE		8
-#define RI_PLAN_SETDEFAULT_ONUPDATE		9
+#define RI_PLAN_RESTRICT				3
+#define RI_PLAN_SETNULL_ONDELETE		4
+#define RI_PLAN_SETNULL_ONUPDATE		5
+#define RI_PLAN_SETDEFAULT_ONDELETE		6
+#define RI_PLAN_SETDEFAULT_ONUPDATE		7
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -229,8 +231,246 @@ static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-							   int queryno, bool partgone) pg_attribute_noreturn();
+							   bool on_fk, bool partgone) pg_attribute_noreturn();
 
+/*
+ * Checks whether a tuple containing the unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found = false;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	Snapshot	snap = InvalidSnapshot;
+	bool		pushed_latest_snapshot = false;
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	Oid			save_userid;
+	int			save_sec_context;
+	AclResult	aclresult;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Switch to referenced table's owner to perform the below operations
+	 * as.  This matches what ri_PerformCheck() does.
+	 *
+	 * Note that as with queries done by ri_PerformCheck(), the way we select
+	 * the referenced row below effectively bypasses any RLS policies that may
+	 * be present on the referenced table.
+	 */
+	GetUserIdAndSecContext(&save_userid, &save_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   save_sec_context | SECURITY_LOCAL_USERID_CHANGE);
+
+	/*
+	 * Also check that the new user has permissions to look into the schema
+	 * of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * In the case of scanning the PK index for ri_Check_Pk_Match(), we'd like
+	 * to see all rows that could be interesting, even those that would not be
+	 * visible to the transaction snapshot.  To do so, force-push the latest
+	 * snapshot.
+	 *
+	 * Also, increment the command counter to make the changes of the current
+	 * command visible in all cases.
+	 */
+	CommandCounterIncrement();
+	if (fk_rel == NULL)
+	{
+		snap = GetLatestSnapshot();
+		PushActiveSnapshot(snap);
+		pushed_latest_snapshot = true;
+	}
+	else
+	{
+		snap = GetTransactionSnapshot();
+		PushActiveSnapshot(snap);
+	}
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_constraint_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+		Snapshot mysnap = InvalidSnapshot;
+
+		/*
+		 * XXX the partition descriptor machinery has a hack that assumes that
+		 * the queries originating in this module push the latest snapshot in
+		 * the transaction-snapshot mode.  If we didn't push one already, do
+		 * so here.
+		 */
+		if (!pushed_latest_snapshot)
+		{
+			mysnap = GetLatestSnapshot();
+			PushActiveSnapshot(mysnap);
+		}
+
+		leaf_pk_rel = find_leaf_part_for_key(pk_rel, riinfo->nkeys,
+											 riinfo->pk_attnums,
+											 pk_vals, pk_nulls,
+											 idxoid, RowShareLock,
+											 &leaf_idxoid);
+		/*
+		 * XXX done fiddling with the partition descriptor machinery so unset
+		 * the active snapshot if we must.
+		 */
+		if (mysnap != InvalidSnapshot)
+			PopActiveSnapshot();
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+		{
+			SetUserIdAndSecContext(save_userid, save_sec_context);
+			PopActiveSnapshot();
+			return false;
+		}
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, snap, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		found = ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+								   snap,
+								   GetCurrentCommandId(false),
+								   LockTupleKeyShare,
+								   LockWaitBlock, NULL);
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/* Restore UID and security context */
+	SetUserIdAndSecContext(save_userid, save_sec_context);
+
+	PopActiveSnapshot();
+
+	return found;
+}
 
 /*
  * RI_FKey_check -
@@ -244,8 +484,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -325,9 +563,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -342,74 +580,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   true, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -464,81 +640,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -1602,15 +1707,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 					 errtableconstraint(fk_rel,
 										NameStr(fake_riinfo.conname))));
 
-		/*
-		 * We tell ri_ReportViolation we were doing the RI_PLAN_CHECK_LOOKUPPK
-		 * query, which isn't true, but will cause it to use
-		 * fake_riinfo.fk_attnums as we need.
-		 */
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   true, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1827,7 +1927,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 			fake_riinfo.pk_attnums[i] = i + 1;
 
 		ri_ReportViolation(&fake_riinfo, pk_rel, fk_rel,
-						   slot, tupdesc, 0, true);
+						   slot, tupdesc, true, true);
 	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
@@ -1964,26 +2064,25 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 {
 	/*
 	 * Inherited constraints with a common ancestor can share ri_query_cache
-	 * entries for all query types except RI_PLAN_CHECK_LOOKUPPK_FROM_PK.
-	 * Except in that case, the query processes the other table involved in
-	 * the FK constraint (i.e., not the table on which the trigger has been
-	 * fired), and so it will be the same for all members of the inheritance
-	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * entries, because each query processes the other table involved in the
+	 * FK constraint (i.e., not the table on which the trigger has been fired),
+	 * and so it will be the same for all members of the inheritance tree.  So
+	 * we may use the root constraint's OID in the hash key, rather than the
+	 * constraint's own OID.  This avoids creating duplicate SPI plans, saving
+	 * lots of work and memory when there are many partitions with similar FK
+	 * constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
 	 *
+	 * (Note also that for a standalone or non-inherited constraint,
+	 * constraint_root_id is same as constraint_id.)
+	 *
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
-		key->constr_id = riinfo->constraint_root_id;
-	else
-		key->constr_id = riinfo->constraint_id;
+	key->constr_id = riinfo->constraint_root_id;
 	key->constr_queryno = constr_queryno;
 }
 
@@ -2254,19 +2353,11 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
-	Relation	query_rel;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel;
 	Oid			save_userid;
 	int			save_sec_context;
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
@@ -2299,9 +2390,10 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
+	bool		source_is_pk = true;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2311,33 +2403,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
@@ -2414,14 +2479,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
-						   qkey->constr_queryno, false);
+						   false, false);
 
 	return SPI_processed != 0;
 }
@@ -2452,9 +2515,9 @@ ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 /*
  * Produce an error report
  *
- * If the failed constraint was on insert/update to the FK table,
- * we want the key names and values extracted from there, and the error
- * message to look like 'key blah is not present in PK'.
+ * If the failed constraint was on insert/update to the FK table (on_fk is
+ * true), we want the key names and values extracted from there, and the
+ * error message to look like 'key blah is not present in PK'.
  * Otherwise, the attr names and values come from the PK table and the
  * message looks like 'key blah is still referenced from FK'.
  */
@@ -2462,22 +2525,20 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 				   Relation pk_rel, Relation fk_rel,
 				   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-				   int queryno, bool partgone)
+				   bool on_fk, bool partgone)
 {
 	StringInfoData key_names;
 	StringInfoData key_values;
-	bool		onfk;
 	const int16 *attnums;
 	Oid			rel_oid;
 	AclResult	aclresult;
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * If tupdesc wasn't passed by caller, assume the violator tuple came from
+	 * there.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
-	if (onfk)
+	if (on_fk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
@@ -2579,7 +2640,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 						   key_names.data, key_values.data,
 						   RelationGetRelationName(fk_rel)),
 				 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
-	else if (onfk)
+	else if (on_fk)
 		ereport(ERROR,
 				(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 				 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -2886,7 +2947,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2942,8 +3006,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 694e38b7dd..cfdd73f95d 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,5 +124,11 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern Relation find_leaf_part_for_key(Relation root_rel,
+					  int key_natts,
+					  const AttrNumber *key_attnums,
+					  Datum *key_vals, char *key_nulls,
+					  Oid root_idxoid, int lockmode,
+					  Oid *leaf_idxoid);
 
 #endif							/* EXECPARTITION_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index cd57a704ad..51c9e86106 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -241,6 +241,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.24.1

#59

Corey Huinker

corey.huinker@gmail.com

about 4 years ago

In reply to: Amit Langote (#58)

Re: simplifying foreign key/RI checks

Good catch, thanks. Patch updated.

Applies clean. Passes check-world.

#60

Zhihong Yu

zyu@yugabyte.com

about 4 years ago

In reply to: Amit Langote (#58)

Re: simplifying foreign key/RI checks

On Mon, Dec 20, 2021 at 5:17 AM Amit Langote <amitlangote09@gmail.com>
wrote:

On Mon, Dec 20, 2021 at 6:19 PM Zhihong Yu <zyu@yugabyte.com> wrote:

On Sun, Dec 19, 2021 at 10:20 PM Amit Langote <amitlangote09@gmail.com>

wrote:

On Mon, Dec 20, 2021 at 2:00 PM Corey Huinker <corey.huinker@gmail.com>

wrote:

Sorry for the delay. This patch no longer applies, it has some

conflict with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a
Thanks Corey for the heads up. Rebased with some cosmetic adjustments.

Hi,
+       Assert(partidx < 0 || partidx < partdesc->nparts);
+       partoid = partdesc->oids[partidx];
If partidx < 0, do we still need to fill out partoid and is_leaf ? It
seems we can return early based on (should call table_close(rel) first):
+       /* No partition found. */
+       if (partidx < 0)
+           return NULL;
Good catch, thanks. Patch updated.

Hi,

+   int         lockflags = 0;
+   TM_Result   test;
+
+   lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;

The above assignment can be meged with the line where variable lockflags is
declared.

+ GetUserIdAndSecContext(&save_userid, &save_sec_context);

save_userid -> saved_userid
save_sec_context -> saved_sec_context

+ * the transaction-snapshot mode. If we didn't push one already, do

didn't push -> haven't pushed

For ri_PerformCheck():

+ bool source_is_pk = true;

It seems the value of source_is_pk doesn't change - the value true can be
plugged into ri_ExtractValues() calls directly.

Cheers

#61

Amit Langote

amitlangote09@gmail.com

almost 4 years ago

In reply to: Zhihong Yu (#60)

2 attachment(s)

Re: simplifying foreign key/RI checks

Thanks for the review.

On Tue, Dec 21, 2021 at 5:54 PM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
+   int         lockflags = 0;
+   TM_Result   test;
+
+   lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
The above assignment can be meged with the line where variable lockflags is declared.

Sure.

+ GetUserIdAndSecContext(&save_userid, &save_sec_context);

save_userid -> saved_userid
save_sec_context -> saved_sec_context

I agree that's better though I guess I had kept the names as they were
in other functions.

Fixed nevertheless.

+ * the transaction-snapshot mode. If we didn't push one already, do

didn't push -> haven't pushed

Done.

For ri_PerformCheck():

+ bool source_is_pk = true;

It seems the value of source_is_pk doesn't change - the value true can be plugged into ri_ExtractValues() calls directly.

OK, done.

v13 is attached.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v13-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v13-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From e1c250eb3529e15334fc91138ea4b0e86a7e936a Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v13 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.

This rewrite allows to fix a PK row visibility bug caused by a
partition descriptor hack which requires ActiveSnapshot to be set to
come up with the correct set of partitions for the RI query running
under REPEATABLE READ isolation.  We now set that snapshot
indepedently of the snapshot to be used by the PK index scan, so the
two no longer interfere.  The buggy output in
src/test/isolation/expected/fk-snapshot.out of the relevant test
case has been corrected.
---
 src/backend/executor/execPartition.c        | 160 +++++-
 src/backend/executor/nodeLockRows.c         | 160 +++---
 src/backend/utils/adt/ri_triggers.c         | 545 +++++++++++---------
 src/include/executor/execPartition.h        |   6 +
 src/include/executor/executor.h             |   9 +
 src/test/isolation/expected/fk-snapshot.out |   4 +-
 src/test/isolation/specs/fk-snapshot.spec   |   5 +-
 7 files changed, 568 insertions(+), 321 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..72ee019330 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -175,8 +175,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -310,7 +311,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1240,12 +1243,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
@@ -1337,6 +1340,151 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * find_leaf_part_for_key
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified key tuple containing a subset of the table's
+ *		columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Returns NULL if no leaf partition is found for the key.  Caller must close
+ * the relation.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+find_leaf_part_for_key(Relation root_rel, int key_natts,
+					   const AttrNumber *key_attnums,
+					   Datum *key_vals, char *key_nulls,
+					   Oid root_idxoid, int lockmode,
+					   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* done using the partition directory */
+		DestroyPartitionDirectory(partdir);
+
+		/* close any intermediate parents we opened */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 1a9dab25dd..ab54a65e0e 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index c95cd32402..cbcf0754dc 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,19 +74,15 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
-#define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
 /* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_ONDELETE		3
-#define RI_PLAN_CASCADE_ONUPDATE		4
+#define RI_PLAN_CASCADE_ONDELETE		1
+#define RI_PLAN_CASCADE_ONUPDATE		2
 /* For RESTRICT, the same plan can be used for both ON DELETE and ON UPDATE triggers. */
-#define RI_PLAN_RESTRICT				5
-#define RI_PLAN_SETNULL_ONDELETE		6
-#define RI_PLAN_SETNULL_ONUPDATE		7
-#define RI_PLAN_SETDEFAULT_ONDELETE		8
-#define RI_PLAN_SETDEFAULT_ONUPDATE		9
+#define RI_PLAN_RESTRICT				3
+#define RI_PLAN_SETNULL_ONDELETE		4
+#define RI_PLAN_SETNULL_ONUPDATE		5
+#define RI_PLAN_SETDEFAULT_ONDELETE		6
+#define RI_PLAN_SETDEFAULT_ONUPDATE		7
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -229,8 +231,246 @@ static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-							   int queryno, bool partgone) pg_attribute_noreturn();
+							   bool on_fk, bool partgone) pg_attribute_noreturn();
 
+/*
+ * Checks whether a tuple containing the unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found = false;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	Snapshot	snap = InvalidSnapshot;
+	bool		pushed_latest_snapshot = false;
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	Oid			saved_userid;
+	int			saved_sec_context;
+	AclResult	aclresult;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Switch to referenced table's owner to perform the below operations
+	 * as.  This matches what ri_PerformCheck() does.
+	 *
+	 * Note that as with queries done by ri_PerformCheck(), the way we select
+	 * the referenced row below effectively bypasses any RLS policies that may
+	 * be present on the referenced table.
+	 */
+	GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   saved_sec_context | SECURITY_LOCAL_USERID_CHANGE);
+
+	/*
+	 * Also check that the new user has permissions to look into the schema
+	 * of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * In the case of scanning the PK index for ri_Check_Pk_Match(), we'd like
+	 * to see all rows that could be interesting, even those that would not be
+	 * visible to the transaction snapshot.  To do so, force-push the latest
+	 * snapshot.
+	 *
+	 * Also, increment the command counter to make the changes of the current
+	 * command visible in all cases.
+	 */
+	CommandCounterIncrement();
+	if (fk_rel == NULL)
+	{
+		snap = GetLatestSnapshot();
+		PushActiveSnapshot(snap);
+		pushed_latest_snapshot = true;
+	}
+	else
+	{
+		snap = GetTransactionSnapshot();
+		PushActiveSnapshot(snap);
+	}
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_constraint_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+		Snapshot mysnap = InvalidSnapshot;
+
+		/*
+		 * XXX the partition descriptor machinery has a hack that assumes that
+		 * the queries originating in this module push the latest snapshot in
+		 * the transaction-snapshot mode.  If we haven't push one already, do
+		 * so here.
+		 */
+		if (!pushed_latest_snapshot)
+		{
+			mysnap = GetLatestSnapshot();
+			PushActiveSnapshot(mysnap);
+		}
+
+		leaf_pk_rel = find_leaf_part_for_key(pk_rel, riinfo->nkeys,
+											 riinfo->pk_attnums,
+											 pk_vals, pk_nulls,
+											 idxoid, RowShareLock,
+											 &leaf_idxoid);
+		/*
+		 * XXX done fiddling with the partition descriptor machinery so unset
+		 * the active snapshot if we must.
+		 */
+		if (mysnap != InvalidSnapshot)
+			PopActiveSnapshot();
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+		{
+			SetUserIdAndSecContext(saved_userid, saved_sec_context);
+			PopActiveSnapshot();
+			return false;
+		}
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, snap, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		found = ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+								   snap,
+								   GetCurrentCommandId(false),
+								   LockTupleKeyShare,
+								   LockWaitBlock, NULL);
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/* Restore UID and security context */
+	SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+	PopActiveSnapshot();
+
+	return found;
+}
 
 /*
  * RI_FKey_check -
@@ -244,8 +484,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -325,9 +563,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -342,74 +580,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   true, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -464,81 +640,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -1602,15 +1707,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 					 errtableconstraint(fk_rel,
 										NameStr(fake_riinfo.conname))));
 
-		/*
-		 * We tell ri_ReportViolation we were doing the RI_PLAN_CHECK_LOOKUPPK
-		 * query, which isn't true, but will cause it to use
-		 * fake_riinfo.fk_attnums as we need.
-		 */
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   true, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1827,7 +1927,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 			fake_riinfo.pk_attnums[i] = i + 1;
 
 		ri_ReportViolation(&fake_riinfo, pk_rel, fk_rel,
-						   slot, tupdesc, 0, true);
+						   slot, tupdesc, true, true);
 	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
@@ -1964,26 +2064,25 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 {
 	/*
 	 * Inherited constraints with a common ancestor can share ri_query_cache
-	 * entries for all query types except RI_PLAN_CHECK_LOOKUPPK_FROM_PK.
-	 * Except in that case, the query processes the other table involved in
-	 * the FK constraint (i.e., not the table on which the trigger has been
-	 * fired), and so it will be the same for all members of the inheritance
-	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * entries, because each query processes the other table involved in the
+	 * FK constraint (i.e., not the table on which the trigger has been fired),
+	 * and so it will be the same for all members of the inheritance tree.  So
+	 * we may use the root constraint's OID in the hash key, rather than the
+	 * constraint's own OID.  This avoids creating duplicate SPI plans, saving
+	 * lots of work and memory when there are many partitions with similar FK
+	 * constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
 	 *
+	 * (Note also that for a standalone or non-inherited constraint,
+	 * constraint_root_id is same as constraint_id.)
+	 *
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
-		key->constr_id = riinfo->constraint_root_id;
-	else
-		key->constr_id = riinfo->constraint_id;
+	key->constr_id = riinfo->constraint_root_id;
 	key->constr_queryno = constr_queryno;
 }
 
@@ -2254,19 +2353,11 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
-	Relation	query_rel;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel;
 	Oid			save_userid;
 	int			save_sec_context;
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
@@ -2299,9 +2390,9 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2311,46 +2402,17 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
-		ri_ExtractValues(source_rel, newslot, riinfo, source_is_pk,
-						 vals, nulls);
+		ri_ExtractValues(source_rel, newslot, riinfo, true, vals, nulls);
 		if (oldslot)
-			ri_ExtractValues(source_rel, oldslot, riinfo, source_is_pk,
+			ri_ExtractValues(source_rel, oldslot, riinfo, true,
 							 vals + riinfo->nkeys, nulls + riinfo->nkeys);
 	}
 	else
 	{
-		ri_ExtractValues(source_rel, oldslot, riinfo, source_is_pk,
-						 vals, nulls);
+		ri_ExtractValues(source_rel, oldslot, riinfo, true, vals, nulls);
 	}
 
 	/*
@@ -2414,14 +2476,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
-						   qkey->constr_queryno, false);
+						   false, false);
 
 	return SPI_processed != 0;
 }
@@ -2452,9 +2512,9 @@ ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 /*
  * Produce an error report
  *
- * If the failed constraint was on insert/update to the FK table,
- * we want the key names and values extracted from there, and the error
- * message to look like 'key blah is not present in PK'.
+ * If the failed constraint was on insert/update to the FK table (on_fk is
+ * true), we want the key names and values extracted from there, and the
+ * error message to look like 'key blah is not present in PK'.
  * Otherwise, the attr names and values come from the PK table and the
  * message looks like 'key blah is still referenced from FK'.
  */
@@ -2462,22 +2522,20 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 				   Relation pk_rel, Relation fk_rel,
 				   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-				   int queryno, bool partgone)
+				   bool on_fk, bool partgone)
 {
 	StringInfoData key_names;
 	StringInfoData key_values;
-	bool		onfk;
 	const int16 *attnums;
 	Oid			rel_oid;
 	AclResult	aclresult;
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * If tupdesc wasn't passed by caller, assume the violator tuple came from
+	 * there.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
-	if (onfk)
+	if (on_fk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
@@ -2579,7 +2637,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 						   key_names.data, key_values.data,
 						   RelationGetRelationName(fk_rel)),
 				 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
-	else if (onfk)
+	else if (on_fk)
 		ereport(ERROR,
 				(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 				 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -2886,7 +2944,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2942,8 +3003,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..e63dcb12f6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,5 +124,11 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern Relation find_leaf_part_for_key(Relation root_rel,
+					  int key_natts,
+					  const AttrNumber *key_attnums,
+					  Datum *key_vals, char *key_nulls,
+					  Oid root_idxoid, int lockmode,
+					  Oid *leaf_idxoid);
 
 #endif							/* EXECPARTITION_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 344399f6a8..8f32353a44 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -241,6 +241,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.24.1

v13-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchapplication/octet-stream; name=v13-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchDownload

From c4be44711c0eb34ad386fc27085fa97db39526ce Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 15 Nov 2021 18:22:33 +0900
Subject: [PATCH v13 1/2] Add isolation tests for snapshot behavior in
 ri_triggers.c

They are to check the behavior of RI_FKey_check() and
ri_Check_Pk_Match().  A test case whereby RI_FKey_check() queries a
partitioned PK table under REPEATABLE READ isolation produces wrong
output due to a bug of the partition-descriptor logic and that is
noted as such in the comment above the test.  A subsequent patch
will fix the bug and replace the buggy output by the correct one.
---
 src/test/isolation/expected/fk-snapshot.out | 124 ++++++++++++++++++++
 src/test/isolation/isolation_schedule       |   1 +
 src/test/isolation/specs/fk-snapshot.spec   |  61 ++++++++++
 3 files changed, 186 insertions(+)
 create mode 100644 src/test/isolation/expected/fk-snapshot.out
 create mode 100644 src/test/isolation/specs/fk-snapshot.spec

diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
new file mode 100644
index 0000000000..5faf80d6ce
--- /dev/null
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -0,0 +1,124 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+ERROR:  insert or update on table "fk_parted_pk_2" violates foreign key constraint "fk_parted_pk_a_fkey"
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s2brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+2
+(2 rows)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brr s1dfp s1ifp1 s1c s1sfn
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s1brc s1dfp s1ifp1 s1c s1sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 99c23b16ff..90f29fd278 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -33,6 +33,7 @@ test: fk-deadlock
 test: fk-deadlock2
 test: fk-partitioned-1
 test: fk-partitioned-2
+test: fk-snapshot
 test: eval-plan-qual
 test: eval-plan-qual-trigger
 test: lock-update-delete
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
new file mode 100644
index 0000000000..378507fbc3
--- /dev/null
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -0,0 +1,61 @@
+setup
+{
+  CREATE TABLE pk_noparted (
+	a			int		PRIMARY KEY
+  );
+
+  CREATE TABLE fk_parted_pk (
+	a			int		PRIMARY KEY REFERENCES pk_noparted ON DELETE CASCADE
+  ) PARTITION BY LIST (a);
+  CREATE TABLE fk_parted_pk_1 PARTITION OF fk_parted_pk FOR VALUES IN (1);
+  CREATE TABLE fk_parted_pk_2 PARTITION OF fk_parted_pk FOR VALUES IN (2);
+
+  CREATE TABLE fk_noparted (
+	a			int		REFERENCES fk_parted_pk ON DELETE NO ACTION INITIALLY DEFERRED
+  );
+  INSERT INTO pk_noparted VALUES (1);
+  INSERT INTO fk_parted_pk VALUES (1);
+  INSERT INTO fk_noparted VALUES (1);
+}
+
+teardown
+{
+  DROP TABLE pk_noparted, fk_parted_pk, fk_noparted;
+}
+
+session s1
+step s1brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s1brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s1ifp2	{ INSERT INTO fk_parted_pk VALUES (2); }
+step s1ifp1	{ INSERT INTO fk_parted_pk VALUES (1); }
+step s1dfp	{ DELETE FROM fk_parted_pk WHERE a = 1; }
+step s1c	{ COMMIT; }
+step s1sfp	{ SELECT * FROM fk_parted_pk; }
+step s1sp	{ SELECT * FROM pk_noparted; }
+step s1sfn	{ SELECT * FROM fk_noparted; }
+
+session s2
+step s2brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s2brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s2ip2	{ INSERT INTO pk_noparted VALUES (2); }
+step s2ifn2	{ INSERT INTO fk_noparted VALUES (2); }
+step s2c	{ COMMIT; }
+step s2sfp	{ SELECT * FROM fk_parted_pk; }
+step s2sfn	{ SELECT * FROM fk_noparted; }
+
+# inserting into referencing tables in transaction-snapshot mode
+# PK table is non-partitioned
+permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+# PK table is partitioned: buggy, because s2's serialization transaction can
+# see the uncommitted row thanks to the latest snapshot taken for
+# partition lookup to work correctly also ends up getting used by the PK index
+# scan
+permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+
+# inserting into referencing tables in up-to-date snapshot mode
+permutation s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+
+# deleting a referenced row and then inserting again in the same transaction; works
+# the same no matter the snapshot mode
+permutation s1brr s1dfp s1ifp1 s1c s1sfn
+permutation s1brc s1dfp s1ifp1 s1c s1sfn
-- 
2.24.1

#62

Amit Langote

amitlangote09@gmail.com

almost 4 years ago

In reply to: Amit Langote (#61)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Tue, Jan 18, 2022 at 3:30 PM Amit Langote <amitlangote09@gmail.com> wrote:

v13 is attached.

I noticed that the recent 641f3dffcdf's changes to
get_constraint_index() made it basically unusable for this patch's
purposes.

Reading in the thread that led to 641f3dffcdf why
get_constraint_index() was changed the way it was, I invented in the
attached updated patch a get_fkey_constraint_index() that is local to
ri_triggers.c for use by the new ri_ReferencedKeyExists(), replacing
get_constraint_index() that no longer gives it the index it's looking
for.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v14-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchapplication/octet-stream; name=v14-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchDownload

From 1bd70ca0c434a364e55bc16ec3edb6c810527435 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 15 Nov 2021 18:22:33 +0900
Subject: [PATCH v14 1/2] Add isolation tests for snapshot behavior in
 ri_triggers.c

They are to check the behavior of RI_FKey_check() and
ri_Check_Pk_Match().  A test case whereby RI_FKey_check() queries a
partitioned PK table under REPEATABLE READ isolation produces wrong
output due to a bug of the partition-descriptor logic and that is
noted as such in the comment above the test.  A subsequent patch
will fix the bug and replace the buggy output by the correct one.
---
 src/test/isolation/expected/fk-snapshot.out | 124 ++++++++++++++++++++
 src/test/isolation/isolation_schedule       |   1 +
 src/test/isolation/specs/fk-snapshot.spec   |  61 ++++++++++
 3 files changed, 186 insertions(+)
 create mode 100644 src/test/isolation/expected/fk-snapshot.out
 create mode 100644 src/test/isolation/specs/fk-snapshot.spec

diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
new file mode 100644
index 0000000000..5faf80d6ce
--- /dev/null
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -0,0 +1,124 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+ERROR:  insert or update on table "fk_parted_pk_2" violates foreign key constraint "fk_parted_pk_a_fkey"
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s2brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+2
+(2 rows)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brr s1dfp s1ifp1 s1c s1sfn
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s1brc s1dfp s1ifp1 s1c s1sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 0dae483e82..2b89acb54d 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -33,6 +33,7 @@ test: fk-deadlock
 test: fk-deadlock2
 test: fk-partitioned-1
 test: fk-partitioned-2
+test: fk-snapshot
 test: eval-plan-qual
 test: eval-plan-qual-trigger
 test: lock-update-delete
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
new file mode 100644
index 0000000000..378507fbc3
--- /dev/null
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -0,0 +1,61 @@
+setup
+{
+  CREATE TABLE pk_noparted (
+	a			int		PRIMARY KEY
+  );
+
+  CREATE TABLE fk_parted_pk (
+	a			int		PRIMARY KEY REFERENCES pk_noparted ON DELETE CASCADE
+  ) PARTITION BY LIST (a);
+  CREATE TABLE fk_parted_pk_1 PARTITION OF fk_parted_pk FOR VALUES IN (1);
+  CREATE TABLE fk_parted_pk_2 PARTITION OF fk_parted_pk FOR VALUES IN (2);
+
+  CREATE TABLE fk_noparted (
+	a			int		REFERENCES fk_parted_pk ON DELETE NO ACTION INITIALLY DEFERRED
+  );
+  INSERT INTO pk_noparted VALUES (1);
+  INSERT INTO fk_parted_pk VALUES (1);
+  INSERT INTO fk_noparted VALUES (1);
+}
+
+teardown
+{
+  DROP TABLE pk_noparted, fk_parted_pk, fk_noparted;
+}
+
+session s1
+step s1brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s1brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s1ifp2	{ INSERT INTO fk_parted_pk VALUES (2); }
+step s1ifp1	{ INSERT INTO fk_parted_pk VALUES (1); }
+step s1dfp	{ DELETE FROM fk_parted_pk WHERE a = 1; }
+step s1c	{ COMMIT; }
+step s1sfp	{ SELECT * FROM fk_parted_pk; }
+step s1sp	{ SELECT * FROM pk_noparted; }
+step s1sfn	{ SELECT * FROM fk_noparted; }
+
+session s2
+step s2brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s2brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s2ip2	{ INSERT INTO pk_noparted VALUES (2); }
+step s2ifn2	{ INSERT INTO fk_noparted VALUES (2); }
+step s2c	{ COMMIT; }
+step s2sfp	{ SELECT * FROM fk_parted_pk; }
+step s2sfn	{ SELECT * FROM fk_noparted; }
+
+# inserting into referencing tables in transaction-snapshot mode
+# PK table is non-partitioned
+permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+# PK table is partitioned: buggy, because s2's serialization transaction can
+# see the uncommitted row thanks to the latest snapshot taken for
+# partition lookup to work correctly also ends up getting used by the PK index
+# scan
+permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+
+# inserting into referencing tables in up-to-date snapshot mode
+permutation s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+
+# deleting a referenced row and then inserting again in the same transaction; works
+# the same no matter the snapshot mode
+permutation s1brr s1dfp s1ifp1 s1c s1sfn
+permutation s1brc s1dfp s1ifp1 s1c s1sfn
-- 
2.24.1

v14-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v14-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From 46ffe1d6df5eb3545666c8d19a3f7f2d977c25db Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v14 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.

This rewrite allows to fix a PK row visibility bug caused by a
partition descriptor hack which requires ActiveSnapshot to be set to
come up with the correct set of partitions for the RI query running
under REPEATABLE READ isolation.  We now set that snapshot
indepedently of the snapshot to be used by the PK index scan, so the
two no longer interfere.  The buggy output in
src/test/isolation/expected/fk-snapshot.out of the relevant test
case has been corrected.
---
 src/backend/executor/execPartition.c        | 160 +++++-
 src/backend/executor/nodeLockRows.c         | 160 +++---
 src/backend/utils/adt/ri_triggers.c         | 573 ++++++++++++--------
 src/include/executor/execPartition.h        |   6 +
 src/include/executor/executor.h             |   9 +
 src/test/isolation/expected/fk-snapshot.out |   4 +-
 src/test/isolation/specs/fk-snapshot.spec   |   5 +-
 7 files changed, 596 insertions(+), 321 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..72ee019330 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -175,8 +175,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -310,7 +311,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1240,12 +1243,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
@@ -1337,6 +1340,151 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * find_leaf_part_for_key
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified key tuple containing a subset of the table's
+ *		columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Returns NULL if no leaf partition is found for the key.  Caller must close
+ * the relation.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+find_leaf_part_for_key(Relation root_rel, int key_natts,
+					   const AttrNumber *key_attnums,
+					   Datum *key_vals, char *key_nulls,
+					   Oid root_idxoid, int lockmode,
+					   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n' ? true : false);
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* done using the partition directory */
+		DestroyPartitionDirectory(partdir);
+
+		/* close any intermediate parents we opened */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 1a9dab25dd..ab54a65e0e 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index c95cd32402..511697f2ce 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,19 +74,15 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
-#define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
 /* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_ONDELETE		3
-#define RI_PLAN_CASCADE_ONUPDATE		4
+#define RI_PLAN_CASCADE_ONDELETE		1
+#define RI_PLAN_CASCADE_ONUPDATE		2
 /* For RESTRICT, the same plan can be used for both ON DELETE and ON UPDATE triggers. */
-#define RI_PLAN_RESTRICT				5
-#define RI_PLAN_SETNULL_ONDELETE		6
-#define RI_PLAN_SETNULL_ONUPDATE		7
-#define RI_PLAN_SETDEFAULT_ONDELETE		8
-#define RI_PLAN_SETDEFAULT_ONUPDATE		9
+#define RI_PLAN_RESTRICT				3
+#define RI_PLAN_SETNULL_ONDELETE		4
+#define RI_PLAN_SETNULL_ONUPDATE		5
+#define RI_PLAN_SETDEFAULT_ONDELETE		6
+#define RI_PLAN_SETDEFAULT_ONUPDATE		7
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -229,8 +231,274 @@ static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-							   int queryno, bool partgone) pg_attribute_noreturn();
+							   bool on_fk, bool partgone) pg_attribute_noreturn();
+static Oid get_fkey_unique_index(Oid conoid);
 
+/*
+ * Checks whether a tuple containing the unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found = false;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	Snapshot	snap = InvalidSnapshot;
+	bool		pushed_latest_snapshot = false;
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	Oid			saved_userid;
+	int			saved_sec_context;
+	AclResult	aclresult;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May neeed to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Switch to referenced table's owner to perform the below operations
+	 * as.  This matches what ri_PerformCheck() does.
+	 *
+	 * Note that as with queries done by ri_PerformCheck(), the way we select
+	 * the referenced row below effectively bypasses any RLS policies that may
+	 * be present on the referenced table.
+	 */
+	GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   saved_sec_context | SECURITY_LOCAL_USERID_CHANGE);
+
+	/*
+	 * Also check that the new user has permissions to look into the schema
+	 * of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * In the case of scanning the PK index for ri_Check_Pk_Match(), we'd like
+	 * to see all rows that could be interesting, even those that would not be
+	 * visible to the transaction snapshot.  To do so, force-push the latest
+	 * snapshot.
+	 *
+	 * Also, increment the command counter to make the changes of the current
+	 * command visible in all cases.
+	 */
+	CommandCounterIncrement();
+	if (fk_rel == NULL)
+	{
+		snap = GetLatestSnapshot();
+		PushActiveSnapshot(snap);
+		pushed_latest_snapshot = true;
+	}
+	else
+	{
+		snap = GetTransactionSnapshot();
+		PushActiveSnapshot(snap);
+	}
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_fkey_unique_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+		Snapshot mysnap = InvalidSnapshot;
+
+		/*
+		 * XXX the partition descriptor machinery has a hack that assumes that
+		 * the queries originating in this module push the latest snapshot in
+		 * the transaction-snapshot mode.  If we haven't push one already, do
+		 * so here.
+		 */
+		if (!pushed_latest_snapshot)
+		{
+			mysnap = GetLatestSnapshot();
+			PushActiveSnapshot(mysnap);
+		}
+
+		leaf_pk_rel = find_leaf_part_for_key(pk_rel, riinfo->nkeys,
+											 riinfo->pk_attnums,
+											 pk_vals, pk_nulls,
+											 idxoid, RowShareLock,
+											 &leaf_idxoid);
+		/*
+		 * XXX done fiddling with the partition descriptor machinery so unset
+		 * the active snapshot if we must.
+		 */
+		if (mysnap != InvalidSnapshot)
+			PopActiveSnapshot();
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+		{
+			SetUserIdAndSecContext(saved_userid, saved_sec_context);
+			PopActiveSnapshot();
+			return false;
+		}
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, snap, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		found = ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+								   snap,
+								   GetCurrentCommandId(false),
+								   LockTupleKeyShare,
+								   LockWaitBlock, NULL);
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/* Restore UID and security context */
+	SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+	PopActiveSnapshot();
+
+	return found;
+}
+
+/*
+ * get_fkey_unique_index
+ * 		Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+	Oid			result = InvalidOid;
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+		if (contup->contype == CONSTRAINT_FOREIGN)
+			result = contup->conindid;
+		ReleaseSysCache(tp);
+	}
+
+	if (!OidIsValid(result))
+		elog(ERROR, "unique index not found for foreign key constraint %u",
+			 conoid);
+
+	return result;
+}
 
 /*
  * RI_FKey_check -
@@ -244,8 +512,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -325,9 +591,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -342,74 +608,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   true, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -464,81 +668,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -1602,15 +1735,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 					 errtableconstraint(fk_rel,
 										NameStr(fake_riinfo.conname))));
 
-		/*
-		 * We tell ri_ReportViolation we were doing the RI_PLAN_CHECK_LOOKUPPK
-		 * query, which isn't true, but will cause it to use
-		 * fake_riinfo.fk_attnums as we need.
-		 */
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   true, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1827,7 +1955,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 			fake_riinfo.pk_attnums[i] = i + 1;
 
 		ri_ReportViolation(&fake_riinfo, pk_rel, fk_rel,
-						   slot, tupdesc, 0, true);
+						   slot, tupdesc, true, true);
 	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
@@ -1964,26 +2092,25 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 {
 	/*
 	 * Inherited constraints with a common ancestor can share ri_query_cache
-	 * entries for all query types except RI_PLAN_CHECK_LOOKUPPK_FROM_PK.
-	 * Except in that case, the query processes the other table involved in
-	 * the FK constraint (i.e., not the table on which the trigger has been
-	 * fired), and so it will be the same for all members of the inheritance
-	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * entries, because each query processes the other table involved in the
+	 * FK constraint (i.e., not the table on which the trigger has been fired),
+	 * and so it will be the same for all members of the inheritance tree.  So
+	 * we may use the root constraint's OID in the hash key, rather than the
+	 * constraint's own OID.  This avoids creating duplicate SPI plans, saving
+	 * lots of work and memory when there are many partitions with similar FK
+	 * constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
 	 *
+	 * (Note also that for a standalone or non-inherited constraint,
+	 * constraint_root_id is same as constraint_id.)
+	 *
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
-		key->constr_id = riinfo->constraint_root_id;
-	else
-		key->constr_id = riinfo->constraint_id;
+	key->constr_id = riinfo->constraint_root_id;
 	key->constr_queryno = constr_queryno;
 }
 
@@ -2254,19 +2381,11 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
-	Relation	query_rel;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel;
 	Oid			save_userid;
 	int			save_sec_context;
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
@@ -2299,9 +2418,9 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2311,46 +2430,17 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
-		ri_ExtractValues(source_rel, newslot, riinfo, source_is_pk,
-						 vals, nulls);
+		ri_ExtractValues(source_rel, newslot, riinfo, true, vals, nulls);
 		if (oldslot)
-			ri_ExtractValues(source_rel, oldslot, riinfo, source_is_pk,
+			ri_ExtractValues(source_rel, oldslot, riinfo, true,
 							 vals + riinfo->nkeys, nulls + riinfo->nkeys);
 	}
 	else
 	{
-		ri_ExtractValues(source_rel, oldslot, riinfo, source_is_pk,
-						 vals, nulls);
+		ri_ExtractValues(source_rel, oldslot, riinfo, true, vals, nulls);
 	}
 
 	/*
@@ -2414,14 +2504,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
-						   qkey->constr_queryno, false);
+						   false, false);
 
 	return SPI_processed != 0;
 }
@@ -2452,9 +2540,9 @@ ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 /*
  * Produce an error report
  *
- * If the failed constraint was on insert/update to the FK table,
- * we want the key names and values extracted from there, and the error
- * message to look like 'key blah is not present in PK'.
+ * If the failed constraint was on insert/update to the FK table (on_fk is
+ * true), we want the key names and values extracted from there, and the
+ * error message to look like 'key blah is not present in PK'.
  * Otherwise, the attr names and values come from the PK table and the
  * message looks like 'key blah is still referenced from FK'.
  */
@@ -2462,22 +2550,20 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 				   Relation pk_rel, Relation fk_rel,
 				   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-				   int queryno, bool partgone)
+				   bool on_fk, bool partgone)
 {
 	StringInfoData key_names;
 	StringInfoData key_values;
-	bool		onfk;
 	const int16 *attnums;
 	Oid			rel_oid;
 	AclResult	aclresult;
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * If tupdesc wasn't passed by caller, assume the violator tuple came from
+	 * there.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
-	if (onfk)
+	if (on_fk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
@@ -2579,7 +2665,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 						   key_names.data, key_values.data,
 						   RelationGetRelationName(fk_rel)),
 				 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
-	else if (onfk)
+	else if (on_fk)
 		ereport(ERROR,
 				(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 				 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -2886,7 +2972,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2942,8 +3031,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..e63dcb12f6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,5 +124,11 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern Relation find_leaf_part_for_key(Relation root_rel,
+					  int key_natts,
+					  const AttrNumber *key_attnums,
+					  Datum *key_vals, char *key_nulls,
+					  Oid root_idxoid, int lockmode,
+					  Oid *leaf_idxoid);
 
 #endif							/* EXECPARTITION_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 344399f6a8..8f32353a44 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -241,6 +241,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.24.1

#63

Zhihong Yu

zyu@yugabyte.com

almost 4 years ago

In reply to: Amit Langote (#62)

Re: simplifying foreign key/RI checks

On Mon, Mar 14, 2022 at 1:33 AM Amit Langote <amitlangote09@gmail.com>
wrote:

On Tue, Jan 18, 2022 at 3:30 PM Amit Langote <amitlangote09@gmail.com>
wrote:

v13 is attached.

I noticed that the recent 641f3dffcdf's changes to
get_constraint_index() made it basically unusable for this patch's
purposes.

Reading in the thread that led to 641f3dffcdf why
get_constraint_index() was changed the way it was, I invented in the
attached updated patch a get_fkey_constraint_index() that is local to
ri_triggers.c for use by the new ri_ReferencedKeyExists(), replacing
get_constraint_index() that no longer gives it the index it's looking
for.

--
Amit Langote
EDB: http://www.enterprisedb.com

Hi,
+ partkey_isnull[j] = (key_nulls[k] == 'n' ? true :
false);

The above can be shortened as:

partkey_isnull[j] = key_nulls[k] == 'n';

+ * May neeed to cast each of the individual values of the foreign
key

neeed -> need

Cheers

#64

Amit Langote

amitlangote09@gmail.com

almost 4 years ago

In reply to: Zhihong Yu (#63)

2 attachment(s)

Re: simplifying foreign key/RI checks

On Mon, Mar 14, 2022 at 6:28 PM Zhihong Yu <zyu@yugabyte.com> wrote:

On Mon, Mar 14, 2022 at 1:33 AM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Jan 18, 2022 at 3:30 PM Amit Langote <amitlangote09@gmail.com> wrote:

v13 is attached.

I noticed that the recent 641f3dffcdf's changes to
get_constraint_index() made it basically unusable for this patch's
purposes.

Reading in the thread that led to 641f3dffcdf why
get_constraint_index() was changed the way it was, I invented in the
attached updated patch a get_fkey_constraint_index() that is local to
ri_triggers.c for use by the new ri_ReferencedKeyExists(), replacing
get_constraint_index() that no longer gives it the index it's looking
for.

Hi,
+ partkey_isnull[j] = (key_nulls[k] == 'n' ? true : false);

The above can be shortened as:

partkey_isnull[j] = key_nulls[k] == 'n';

+ * May neeed to cast each of the individual values of the foreign key

neeed -> need

Both fixed, thanks.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v15-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchapplication/octet-stream; name=v15-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchDownload

From 9824d2801d6082afdff1e0a0091d0b1446258c48 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 15 Nov 2021 18:22:33 +0900
Subject: [PATCH v15 1/2] Add isolation tests for snapshot behavior in
 ri_triggers.c

They are to check the behavior of RI_FKey_check() and
ri_Check_Pk_Match().  A test case whereby RI_FKey_check() queries a
partitioned PK table under REPEATABLE READ isolation produces wrong
output due to a bug of the partition-descriptor logic and that is
noted as such in the comment above the test.  A subsequent patch
will fix the bug and replace the buggy output by the correct one.
---
 src/test/isolation/expected/fk-snapshot.out | 124 ++++++++++++++++++++
 src/test/isolation/isolation_schedule       |   1 +
 src/test/isolation/specs/fk-snapshot.spec   |  61 ++++++++++
 3 files changed, 186 insertions(+)
 create mode 100644 src/test/isolation/expected/fk-snapshot.out
 create mode 100644 src/test/isolation/specs/fk-snapshot.spec

diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
new file mode 100644
index 0000000000..5faf80d6ce
--- /dev/null
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -0,0 +1,124 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+ERROR:  insert or update on table "fk_parted_pk_2" violates foreign key constraint "fk_parted_pk_a_fkey"
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s2brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+2
+(2 rows)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brr s1dfp s1ifp1 s1c s1sfn
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s1brc s1dfp s1ifp1 s1c s1sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 8e87098150..6907c342aa 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -33,6 +33,7 @@ test: fk-deadlock
 test: fk-deadlock2
 test: fk-partitioned-1
 test: fk-partitioned-2
+test: fk-snapshot
 test: eval-plan-qual
 test: eval-plan-qual-trigger
 test: lock-update-delete
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
new file mode 100644
index 0000000000..378507fbc3
--- /dev/null
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -0,0 +1,61 @@
+setup
+{
+  CREATE TABLE pk_noparted (
+	a			int		PRIMARY KEY
+  );
+
+  CREATE TABLE fk_parted_pk (
+	a			int		PRIMARY KEY REFERENCES pk_noparted ON DELETE CASCADE
+  ) PARTITION BY LIST (a);
+  CREATE TABLE fk_parted_pk_1 PARTITION OF fk_parted_pk FOR VALUES IN (1);
+  CREATE TABLE fk_parted_pk_2 PARTITION OF fk_parted_pk FOR VALUES IN (2);
+
+  CREATE TABLE fk_noparted (
+	a			int		REFERENCES fk_parted_pk ON DELETE NO ACTION INITIALLY DEFERRED
+  );
+  INSERT INTO pk_noparted VALUES (1);
+  INSERT INTO fk_parted_pk VALUES (1);
+  INSERT INTO fk_noparted VALUES (1);
+}
+
+teardown
+{
+  DROP TABLE pk_noparted, fk_parted_pk, fk_noparted;
+}
+
+session s1
+step s1brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s1brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s1ifp2	{ INSERT INTO fk_parted_pk VALUES (2); }
+step s1ifp1	{ INSERT INTO fk_parted_pk VALUES (1); }
+step s1dfp	{ DELETE FROM fk_parted_pk WHERE a = 1; }
+step s1c	{ COMMIT; }
+step s1sfp	{ SELECT * FROM fk_parted_pk; }
+step s1sp	{ SELECT * FROM pk_noparted; }
+step s1sfn	{ SELECT * FROM fk_noparted; }
+
+session s2
+step s2brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s2brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s2ip2	{ INSERT INTO pk_noparted VALUES (2); }
+step s2ifn2	{ INSERT INTO fk_noparted VALUES (2); }
+step s2c	{ COMMIT; }
+step s2sfp	{ SELECT * FROM fk_parted_pk; }
+step s2sfn	{ SELECT * FROM fk_noparted; }
+
+# inserting into referencing tables in transaction-snapshot mode
+# PK table is non-partitioned
+permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+# PK table is partitioned: buggy, because s2's serialization transaction can
+# see the uncommitted row thanks to the latest snapshot taken for
+# partition lookup to work correctly also ends up getting used by the PK index
+# scan
+permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+
+# inserting into referencing tables in up-to-date snapshot mode
+permutation s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+
+# deleting a referenced row and then inserting again in the same transaction; works
+# the same no matter the snapshot mode
+permutation s1brr s1dfp s1ifp1 s1c s1sfn
+permutation s1brc s1dfp s1ifp1 s1c s1sfn
-- 
2.24.1

v15-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v15-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From 437e9c63cafa0263caf204b93cc76f74994903bd Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v15 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.

This rewrite allows to fix a PK row visibility bug caused by a
partition descriptor hack which requires ActiveSnapshot to be set to
come up with the correct set of partitions for the RI query running
under REPEATABLE READ isolation.  We now set that snapshot
indepedently of the snapshot to be used by the PK index scan, so the
two no longer interfere.  The buggy output in
src/test/isolation/expected/fk-snapshot.out of the relevant test
case has been corrected.
---
 src/backend/executor/execPartition.c        | 160 +++++-
 src/backend/executor/nodeLockRows.c         | 160 +++---
 src/backend/utils/adt/ri_triggers.c         | 573 ++++++++++++--------
 src/include/executor/execPartition.h        |   6 +
 src/include/executor/executor.h             |   9 +
 src/test/isolation/expected/fk-snapshot.out |   4 +-
 src/test/isolation/specs/fk-snapshot.spec   |   5 +-
 7 files changed, 596 insertions(+), 321 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..bfdc06eaa9 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -175,8 +175,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -310,7 +311,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1240,12 +1243,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
@@ -1337,6 +1340,151 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * find_leaf_part_for_key
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified key tuple containing a subset of the table's
+ *		columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Returns NULL if no leaf partition is found for the key.  Caller must close
+ * the relation.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+find_leaf_part_for_key(Relation root_rel, int key_natts,
+					   const AttrNumber *key_attnums,
+					   Datum *key_vals, char *key_nulls,
+					   Oid root_idxoid, int lockmode,
+					   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n');
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* done using the partition directory */
+		DestroyPartitionDirectory(partdir);
+
+		/* close any intermediate parents we opened */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 1a9dab25dd..ab54a65e0e 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 01d4c22cfc..11fd4ec500 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,19 +74,15 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
-#define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
 /* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_ONDELETE		3
-#define RI_PLAN_CASCADE_ONUPDATE		4
+#define RI_PLAN_CASCADE_ONDELETE		1
+#define RI_PLAN_CASCADE_ONUPDATE		2
 /* For RESTRICT, the same plan can be used for both ON DELETE and ON UPDATE triggers. */
-#define RI_PLAN_RESTRICT				5
-#define RI_PLAN_SETNULL_ONDELETE		6
-#define RI_PLAN_SETNULL_ONUPDATE		7
-#define RI_PLAN_SETDEFAULT_ONDELETE		8
-#define RI_PLAN_SETDEFAULT_ONUPDATE		9
+#define RI_PLAN_RESTRICT				3
+#define RI_PLAN_SETNULL_ONDELETE		4
+#define RI_PLAN_SETNULL_ONUPDATE		5
+#define RI_PLAN_SETDEFAULT_ONDELETE		6
+#define RI_PLAN_SETDEFAULT_ONUPDATE		7
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -229,8 +231,274 @@ static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-							   int queryno, bool partgone) pg_attribute_noreturn();
+							   bool on_fk, bool partgone) pg_attribute_noreturn();
+static Oid get_fkey_unique_index(Oid conoid);
 
+/*
+ * Checks whether a tuple containing the unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found = false;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	Snapshot	snap = InvalidSnapshot;
+	bool		pushed_latest_snapshot = false;
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	Oid			saved_userid;
+	int			saved_sec_context;
+	AclResult	aclresult;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May need to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Switch to referenced table's owner to perform the below operations
+	 * as.  This matches what ri_PerformCheck() does.
+	 *
+	 * Note that as with queries done by ri_PerformCheck(), the way we select
+	 * the referenced row below effectively bypasses any RLS policies that may
+	 * be present on the referenced table.
+	 */
+	GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   saved_sec_context | SECURITY_LOCAL_USERID_CHANGE);
+
+	/*
+	 * Also check that the new user has permissions to look into the schema
+	 * of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * In the case of scanning the PK index for ri_Check_Pk_Match(), we'd like
+	 * to see all rows that could be interesting, even those that would not be
+	 * visible to the transaction snapshot.  To do so, force-push the latest
+	 * snapshot.
+	 *
+	 * Also, increment the command counter to make the changes of the current
+	 * command visible in all cases.
+	 */
+	CommandCounterIncrement();
+	if (fk_rel == NULL)
+	{
+		snap = GetLatestSnapshot();
+		PushActiveSnapshot(snap);
+		pushed_latest_snapshot = true;
+	}
+	else
+	{
+		snap = GetTransactionSnapshot();
+		PushActiveSnapshot(snap);
+	}
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_fkey_unique_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+		Snapshot mysnap = InvalidSnapshot;
+
+		/*
+		 * XXX the partition descriptor machinery has a hack that assumes that
+		 * the queries originating in this module push the latest snapshot in
+		 * the transaction-snapshot mode.  If we haven't push one already, do
+		 * so here.
+		 */
+		if (!pushed_latest_snapshot)
+		{
+			mysnap = GetLatestSnapshot();
+			PushActiveSnapshot(mysnap);
+		}
+
+		leaf_pk_rel = find_leaf_part_for_key(pk_rel, riinfo->nkeys,
+											 riinfo->pk_attnums,
+											 pk_vals, pk_nulls,
+											 idxoid, RowShareLock,
+											 &leaf_idxoid);
+		/*
+		 * XXX done fiddling with the partition descriptor machinery so unset
+		 * the active snapshot if we must.
+		 */
+		if (mysnap != InvalidSnapshot)
+			PopActiveSnapshot();
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+		{
+			SetUserIdAndSecContext(saved_userid, saved_sec_context);
+			PopActiveSnapshot();
+			return false;
+		}
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, snap, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		found = ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+								   snap,
+								   GetCurrentCommandId(false),
+								   LockTupleKeyShare,
+								   LockWaitBlock, NULL);
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/* Restore UID and security context */
+	SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+	PopActiveSnapshot();
+
+	return found;
+}
+
+/*
+ * get_fkey_unique_index
+ * 		Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+	Oid			result = InvalidOid;
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+		if (contup->contype == CONSTRAINT_FOREIGN)
+			result = contup->conindid;
+		ReleaseSysCache(tp);
+	}
+
+	if (!OidIsValid(result))
+		elog(ERROR, "unique index not found for foreign key constraint %u",
+			 conoid);
+
+	return result;
+}
 
 /*
  * RI_FKey_check -
@@ -244,8 +512,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -325,9 +591,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -342,74 +608,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   true, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -464,81 +668,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -1608,15 +1741,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 					 errtableconstraint(fk_rel,
 										NameStr(fake_riinfo.conname))));
 
-		/*
-		 * We tell ri_ReportViolation we were doing the RI_PLAN_CHECK_LOOKUPPK
-		 * query, which isn't true, but will cause it to use
-		 * fake_riinfo.fk_attnums as we need.
-		 */
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   true, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1833,7 +1961,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 			fake_riinfo.pk_attnums[i] = i + 1;
 
 		ri_ReportViolation(&fake_riinfo, pk_rel, fk_rel,
-						   slot, tupdesc, 0, true);
+						   slot, tupdesc, true, true);
 	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
@@ -1970,26 +2098,25 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 {
 	/*
 	 * Inherited constraints with a common ancestor can share ri_query_cache
-	 * entries for all query types except RI_PLAN_CHECK_LOOKUPPK_FROM_PK.
-	 * Except in that case, the query processes the other table involved in
-	 * the FK constraint (i.e., not the table on which the trigger has been
-	 * fired), and so it will be the same for all members of the inheritance
-	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * entries, because each query processes the other table involved in the
+	 * FK constraint (i.e., not the table on which the trigger has been fired),
+	 * and so it will be the same for all members of the inheritance tree.  So
+	 * we may use the root constraint's OID in the hash key, rather than the
+	 * constraint's own OID.  This avoids creating duplicate SPI plans, saving
+	 * lots of work and memory when there are many partitions with similar FK
+	 * constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
 	 *
+	 * (Note also that for a standalone or non-inherited constraint,
+	 * constraint_root_id is same as constraint_id.)
+	 *
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
-		key->constr_id = riinfo->constraint_root_id;
-	else
-		key->constr_id = riinfo->constraint_id;
+	key->constr_id = riinfo->constraint_root_id;
 	key->constr_queryno = constr_queryno;
 }
 
@@ -2260,19 +2387,11 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
-	Relation	query_rel;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel;
 	Oid			save_userid;
 	int			save_sec_context;
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
@@ -2305,9 +2424,9 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2317,46 +2436,17 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
-		ri_ExtractValues(source_rel, newslot, riinfo, source_is_pk,
-						 vals, nulls);
+		ri_ExtractValues(source_rel, newslot, riinfo, true, vals, nulls);
 		if (oldslot)
-			ri_ExtractValues(source_rel, oldslot, riinfo, source_is_pk,
+			ri_ExtractValues(source_rel, oldslot, riinfo, true,
 							 vals + riinfo->nkeys, nulls + riinfo->nkeys);
 	}
 	else
 	{
-		ri_ExtractValues(source_rel, oldslot, riinfo, source_is_pk,
-						 vals, nulls);
+		ri_ExtractValues(source_rel, oldslot, riinfo, true, vals, nulls);
 	}
 
 	/*
@@ -2420,14 +2510,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
-						   qkey->constr_queryno, false);
+						   false, false);
 
 	return SPI_processed != 0;
 }
@@ -2458,9 +2546,9 @@ ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 /*
  * Produce an error report
  *
- * If the failed constraint was on insert/update to the FK table,
- * we want the key names and values extracted from there, and the error
- * message to look like 'key blah is not present in PK'.
+ * If the failed constraint was on insert/update to the FK table (on_fk is
+ * true), we want the key names and values extracted from there, and the
+ * error message to look like 'key blah is not present in PK'.
  * Otherwise, the attr names and values come from the PK table and the
  * message looks like 'key blah is still referenced from FK'.
  */
@@ -2468,22 +2556,20 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 				   Relation pk_rel, Relation fk_rel,
 				   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-				   int queryno, bool partgone)
+				   bool on_fk, bool partgone)
 {
 	StringInfoData key_names;
 	StringInfoData key_values;
-	bool		onfk;
 	const int16 *attnums;
 	Oid			rel_oid;
 	AclResult	aclresult;
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * If tupdesc wasn't passed by caller, assume the violator tuple came from
+	 * there.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
-	if (onfk)
+	if (on_fk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
@@ -2585,7 +2671,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 						   key_names.data, key_values.data,
 						   RelationGetRelationName(fk_rel)),
 				 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
-	else if (onfk)
+	else if (on_fk)
 		ereport(ERROR,
 				(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 				 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -2892,7 +2978,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2948,8 +3037,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..e63dcb12f6 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,5 +124,11 @@ extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
 												  int nsubplans);
+extern Relation find_leaf_part_for_key(Relation root_rel,
+					  int key_natts,
+					  const AttrNumber *key_attnums,
+					  Datum *key_vals, char *key_nulls,
+					  Oid root_idxoid, int lockmode,
+					  Oid *leaf_idxoid);
 
 #endif							/* EXECPARTITION_H */
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 82925b4b63..2e1052eeb8 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -243,6 +243,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.24.1

#65

Amit Langote

amitlangote09@gmail.com

almost 4 years ago

In reply to: Amit Langote (#64)

2 attachment(s)

Re: simplifying foreign key/RI checks

There were rebase conflicts with the recently committed
execPartition.c/h changes. While fixing them, I thought maybe
find_leaf_part_for_key() doesn't quite match in style with its
neighbors in execPartition.h, so changed it to
ExecGetLeafPartitionForKey().

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v15-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchapplication/octet-stream; name=v15-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patchDownload

From cb0f371ecec0ce6a737478098a81620e7edd495c Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Mon, 15 Nov 2021 18:22:33 +0900
Subject: [PATCH v15 1/2] Add isolation tests for snapshot behavior in
 ri_triggers.c

They are to check the behavior of RI_FKey_check() and
ri_Check_Pk_Match().  A test case whereby RI_FKey_check() queries a
partitioned PK table under REPEATABLE READ isolation produces wrong
output due to a bug of the partition-descriptor logic and that is
noted as such in the comment above the test.  A subsequent patch
will fix the bug and replace the buggy output by the correct one.
---
 src/test/isolation/expected/fk-snapshot.out | 124 ++++++++++++++++++++
 src/test/isolation/isolation_schedule       |   1 +
 src/test/isolation/specs/fk-snapshot.spec   |  61 ++++++++++
 3 files changed, 186 insertions(+)
 create mode 100644 src/test/isolation/expected/fk-snapshot.out
 create mode 100644 src/test/isolation/specs/fk-snapshot.spec

diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
new file mode 100644
index 0000000000..5faf80d6ce
--- /dev/null
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -0,0 +1,124 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+ERROR:  insert or update on table "fk_parted_pk_2" violates foreign key constraint "fk_parted_pk_a_fkey"
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s2brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2ip2: INSERT INTO pk_noparted VALUES (2);
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+(1 row)
+
+step s2c: COMMIT;
+step s1sp: SELECT * FROM pk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+step s1ifp2: INSERT INTO fk_parted_pk VALUES (2);
+step s2brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s2sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+(1 row)
+
+step s1c: COMMIT;
+step s1sfp: SELECT * FROM fk_parted_pk;
+a
+-
+1
+2
+(2 rows)
+
+step s2ifn2: INSERT INTO fk_noparted VALUES (2);
+step s2c: COMMIT;
+step s2sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+2
+(2 rows)
+
+
+starting permutation: s1brr s1dfp s1ifp1 s1c s1sfn
+step s1brr: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
+
+starting permutation: s1brc s1dfp s1ifp1 s1c s1sfn
+step s1brc: BEGIN ISOLATION LEVEL READ COMMITTED;
+step s1dfp: DELETE FROM fk_parted_pk WHERE a = 1;
+step s1ifp1: INSERT INTO fk_parted_pk VALUES (1);
+step s1c: COMMIT;
+step s1sfn: SELECT * FROM fk_noparted;
+a
+-
+1
+(1 row)
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index a48caae228..0b041559ee 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -33,6 +33,7 @@ test: fk-deadlock
 test: fk-deadlock2
 test: fk-partitioned-1
 test: fk-partitioned-2
+test: fk-snapshot
 test: eval-plan-qual
 test: eval-plan-qual-trigger
 test: lock-update-delete
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
new file mode 100644
index 0000000000..378507fbc3
--- /dev/null
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -0,0 +1,61 @@
+setup
+{
+  CREATE TABLE pk_noparted (
+	a			int		PRIMARY KEY
+  );
+
+  CREATE TABLE fk_parted_pk (
+	a			int		PRIMARY KEY REFERENCES pk_noparted ON DELETE CASCADE
+  ) PARTITION BY LIST (a);
+  CREATE TABLE fk_parted_pk_1 PARTITION OF fk_parted_pk FOR VALUES IN (1);
+  CREATE TABLE fk_parted_pk_2 PARTITION OF fk_parted_pk FOR VALUES IN (2);
+
+  CREATE TABLE fk_noparted (
+	a			int		REFERENCES fk_parted_pk ON DELETE NO ACTION INITIALLY DEFERRED
+  );
+  INSERT INTO pk_noparted VALUES (1);
+  INSERT INTO fk_parted_pk VALUES (1);
+  INSERT INTO fk_noparted VALUES (1);
+}
+
+teardown
+{
+  DROP TABLE pk_noparted, fk_parted_pk, fk_noparted;
+}
+
+session s1
+step s1brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s1brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s1ifp2	{ INSERT INTO fk_parted_pk VALUES (2); }
+step s1ifp1	{ INSERT INTO fk_parted_pk VALUES (1); }
+step s1dfp	{ DELETE FROM fk_parted_pk WHERE a = 1; }
+step s1c	{ COMMIT; }
+step s1sfp	{ SELECT * FROM fk_parted_pk; }
+step s1sp	{ SELECT * FROM pk_noparted; }
+step s1sfn	{ SELECT * FROM fk_noparted; }
+
+session s2
+step s2brr	{ BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s2brc	{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step s2ip2	{ INSERT INTO pk_noparted VALUES (2); }
+step s2ifn2	{ INSERT INTO fk_noparted VALUES (2); }
+step s2c	{ COMMIT; }
+step s2sfp	{ SELECT * FROM fk_parted_pk; }
+step s2sfn	{ SELECT * FROM fk_noparted; }
+
+# inserting into referencing tables in transaction-snapshot mode
+# PK table is non-partitioned
+permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
+# PK table is partitioned: buggy, because s2's serialization transaction can
+# see the uncommitted row thanks to the latest snapshot taken for
+# partition lookup to work correctly also ends up getting used by the PK index
+# scan
+permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
+
+# inserting into referencing tables in up-to-date snapshot mode
+permutation s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
+
+# deleting a referenced row and then inserting again in the same transaction; works
+# the same no matter the snapshot mode
+permutation s1brr s1dfp s1ifp1 s1c s1sfn
+permutation s1brc s1dfp s1ifp1 s1c s1sfn
-- 
2.24.1

v15-0002-Avoid-using-SPI-for-some-RI-checks.patchapplication/octet-stream; name=v15-0002-Avoid-using-SPI-for-some-RI-checks.patchDownload

From 9c5659738dbbef3e8dd5edf84952fad42fa5e82c Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Tue, 12 Jan 2021 14:17:31 +0900
Subject: [PATCH v15 2/2] Avoid using SPI for some RI checks

This modifies the subroutines called by RI trigger functions that
want to check if a given referenced value exists in the referenced
relation to simply scan the foreign key constraint's unique index.
That replaces the current way of issuing a
`SELECT 1 FROM referenced_relation WHERE ref_key = $1` query
through SPI to do the same.  This saves a lot of work, especially
when inserting into or updating a referencing relation.

This rewrite allows to fix a PK row visibility bug caused by a
partition descriptor hack which requires ActiveSnapshot to be set to
come up with the correct set of partitions for the RI query running
under REPEATABLE READ isolation.  We now set that snapshot
indepedently of the snapshot to be used by the PK index scan, so the
two no longer interfere.  The buggy output in
src/test/isolation/expected/fk-snapshot.out of the relevant test
case has been corrected.
---
 src/backend/executor/execPartition.c        | 160 +++++-
 src/backend/executor/nodeLockRows.c         | 160 +++---
 src/backend/utils/adt/ri_triggers.c         | 573 ++++++++++++--------
 src/include/executor/execPartition.h        |   7 +-
 src/include/executor/executor.h             |   9 +
 src/test/isolation/expected/fk-snapshot.out |   4 +-
 src/test/isolation/specs/fk-snapshot.spec   |   5 +-
 7 files changed, 596 insertions(+), 322 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 615bd80973..d03644ae09 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -176,8 +176,9 @@ static void FormPartitionKeyDatum(PartitionDispatch pd,
 								  EState *estate,
 								  Datum *values,
 								  bool *isnull);
-static int	get_partition_for_tuple(PartitionDispatch pd, Datum *values,
-									bool *isnull);
+static int get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull);
 static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 												  Datum *values,
 												  bool *isnull,
@@ -318,7 +319,9 @@ ExecFindPartition(ModifyTableState *mtstate,
 		 * these values, error out.
 		 */
 		if (partdesc->nparts == 0 ||
-			(partidx = get_partition_for_tuple(dispatch, values, isnull)) < 0)
+			(partidx = get_partition_for_tuple(dispatch->key,
+											   dispatch->partdesc,
+											   values, isnull)) < 0)
 		{
 			char	   *val_desc;
 
@@ -1341,12 +1344,12 @@ FormPartitionKeyDatum(PartitionDispatch pd,
  * found or -1 if none found.
  */
 static int
-get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
+get_partition_for_tuple(PartitionKey key,
+						PartitionDesc partdesc,
+						Datum *values, bool *isnull)
 {
 	int			bound_offset;
 	int			part_index = -1;
-	PartitionKey key = pd->key;
-	PartitionDesc partdesc = pd->partdesc;
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
 
 	/* Route as appropriate based on partitioning strategy. */
@@ -1438,6 +1441,151 @@ get_partition_for_tuple(PartitionDispatch pd, Datum *values, bool *isnull)
 	return part_index;
 }
 
+/*
+ * ExecGetLeafPartitionForTuple
+ *		Finds the leaf partition of a partitioned table 'root_rel' that might
+ *		contain the specified key tuple containing a subset of the table's
+ *		columns (including all of the partition key columns)
+ *
+ * 'key_natts' specifies the number columns contained in the key,
+ * 'key_attnums' their attribute numbers as defined in 'root_rel', and
+ * 'key_vals' and 'key_nulls' specify the key tuple.
+ *
+ * Returns NULL if no leaf partition is found for the key.  Caller must close
+ * the relation.
+ *
+ * This works because the unique key defined on the root relation is required
+ * to contain the partition key columns of all of the ancestors that lead up to
+ * a given leaf partition.
+ */
+Relation
+ExecGetLeafPartitionForKey(Relation root_rel, int key_natts,
+						   const AttrNumber *key_attnums,
+						   Datum *key_vals, char *key_nulls,
+						   Oid root_idxoid, int lockmode,
+						   Oid *leaf_idxoid)
+{
+	Relation	rel = root_rel;
+	Oid			constr_idxoid = root_idxoid;
+
+	*leaf_idxoid = InvalidOid;
+
+	/*
+	 * Descend through partitioned parents to find the leaf partition that
+	 * would accept a row with the provided key values, starting with the root
+	 * parent.
+	 */
+	while (true)
+	{
+		PartitionKey partkey = RelationGetPartitionKey(rel);
+		PartitionDirectory partdir;
+		PartitionDesc partdesc;
+		Datum	partkey_vals[PARTITION_MAX_KEYS];
+		bool	partkey_isnull[PARTITION_MAX_KEYS];
+		AttrNumber *root_partattrs = partkey->partattrs;
+		int		i,
+				j;
+		int		partidx;
+		Oid		partoid;
+		bool	is_leaf;
+
+		/*
+		 * Collect partition key values from the unique key.
+		 *
+		 * Because we only have the root table's copy of pk_attnums, must map
+		 * any non-root table's partition key attribute numbers to the root
+		 * table's.
+		 */
+		if (rel != root_rel)
+		{
+			/*
+			 * map->attnums will contain root table attribute numbers for each
+			 * attribute of the current partitioned relation.
+			 */
+			AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel),
+														RelationGetDescr(rel));
+
+			if (map)
+			{
+				root_partattrs = palloc(partkey->partnatts *
+										sizeof(AttrNumber));
+				for (i = 0; i < partkey->partnatts; i++)
+				{
+					AttrNumber	partattno = partkey->partattrs[i];
+
+					root_partattrs[i] = map->attnums[partattno - 1];
+				}
+
+				free_attrmap(map);
+			}
+		}
+
+		/*
+		 * Referenced key specification does not allow expressions, so there
+		 * would not be expressions in the partition keys either.
+		 */
+		Assert(partkey->partexprs == NIL);
+		for (i = 0, j = 0; i < partkey->partnatts; i++)
+		{
+			int		k;
+
+			for (k = 0; k < key_natts; k++)
+			{
+				if (root_partattrs[i] == key_attnums[k])
+				{
+					partkey_vals[j] = key_vals[k];
+					partkey_isnull[j] = (key_nulls[k] == 'n');
+					j++;
+					break;
+				}
+			}
+		}
+		/* Had better have found values for all of the partition keys. */
+		Assert(j == partkey->partnatts);
+
+		if (root_partattrs != partkey->partattrs)
+			pfree(root_partattrs);
+
+		/* Get the PartitionDesc using the partition directory machinery.  */
+		partdir = CreatePartitionDirectory(CurrentMemoryContext, true);
+		partdesc = PartitionDirectoryLookup(partdir, rel);
+
+		/* Find the partition for the key. */
+		partidx = get_partition_for_tuple(partkey, partdesc,
+										  partkey_vals, partkey_isnull);
+		Assert(partidx < 0 || partidx < partdesc->nparts);
+
+		/* done using the partition directory */
+		DestroyPartitionDirectory(partdir);
+
+		/* close any intermediate parents we opened */
+		if (rel != root_rel)
+			table_close(rel, NoLock);
+
+		/* No partition found. */
+		if (partidx < 0)
+			return NULL;
+
+		partoid = partdesc->oids[partidx];
+		rel = table_open(partoid, lockmode);
+		constr_idxoid = index_get_partition(rel, constr_idxoid);
+
+		/*
+		 * Return if the partition is a leaf, else find its partition in the
+		 * next iteration.
+		 */
+		is_leaf = partdesc->is_leaf[partidx];
+		if (is_leaf)
+		{
+			*leaf_idxoid = constr_idxoid;
+			return rel;
+		}
+	}
+
+	Assert(false);
+	return NULL;
+}
+
 /*
  * ExecBuildSlotPartitionKeyDescription
  *
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 1a9dab25dd..ab54a65e0e 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -79,10 +79,7 @@ lnext:
 		Datum		datum;
 		bool		isNull;
 		ItemPointerData tid;
-		TM_FailureData tmfd;
 		LockTupleMode lockmode;
-		int			lockflags = 0;
-		TM_Result	test;
 		TupleTableSlot *markSlot;
 
 		/* clear any leftover test tuple for this rel */
@@ -179,74 +176,11 @@ lnext:
 				break;
 		}
 
-		lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
-		if (!IsolationUsesXactSnapshot())
-			lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
-
-		test = table_tuple_lock(erm->relation, &tid, estate->es_snapshot,
-								markSlot, estate->es_output_cid,
-								lockmode, erm->waitPolicy,
-								lockflags,
-								&tmfd);
-
-		switch (test)
-		{
-			case TM_WouldBlock:
-				/* couldn't lock tuple in SKIP LOCKED mode */
-				goto lnext;
-
-			case TM_SelfModified:
-
-				/*
-				 * The target tuple was already updated or deleted by the
-				 * current command, or by a later command in the current
-				 * transaction.  We *must* ignore the tuple in the former
-				 * case, so as to avoid the "Halloween problem" of repeated
-				 * update attempts.  In the latter case it might be sensible
-				 * to fetch the updated tuple instead, but doing so would
-				 * require changing heap_update and heap_delete to not
-				 * complain about updating "invisible" tuples, which seems
-				 * pretty scary (table_tuple_lock will not complain, but few
-				 * callers expect TM_Invisible, and we're not one of them). So
-				 * for now, treat the tuple as deleted and do not process.
-				 */
-				goto lnext;
-
-			case TM_Ok:
-
-				/*
-				 * Got the lock successfully, the locked tuple saved in
-				 * markSlot for, if needed, EvalPlanQual testing below.
-				 */
-				if (tmfd.traversed)
-					epq_needed = true;
-				break;
-
-			case TM_Updated:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				elog(ERROR, "unexpected table_tuple_lock status: %u",
-					 test);
-				break;
-
-			case TM_Deleted:
-				if (IsolationUsesXactSnapshot())
-					ereport(ERROR,
-							(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-							 errmsg("could not serialize access due to concurrent update")));
-				/* tuple was deleted so don't return it */
-				goto lnext;
-
-			case TM_Invisible:
-				elog(ERROR, "attempted to lock invisible tuple");
-				break;
-
-			default:
-				elog(ERROR, "unrecognized table_tuple_lock status: %u",
-					 test);
-		}
+		/* skip tuple if it couldn't be locked */
+		if (!ExecLockTableTuple(erm->relation, &tid, markSlot,
+								estate->es_snapshot, estate->es_output_cid,
+								lockmode, erm->waitPolicy, &epq_needed))
+			goto lnext;
 
 		/* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
 		erm->curCtid = tid;
@@ -281,6 +215,90 @@ lnext:
 	return slot;
 }
 
+/*
+ * ExecLockTableTuple
+ * 		Locks tuple with the specified TID in lockmode following given wait
+ * 		policy
+ *
+ * Returns true if the tuple was successfully locked.  Locked tuple is loaded
+ * into provided slot.
+ */
+bool
+ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed)
+{
+	TM_FailureData tmfd;
+	int			lockflags = TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS;
+	TM_Result	test;
+
+	if (!IsolationUsesXactSnapshot())
+		lockflags |= TUPLE_LOCK_FLAG_FIND_LAST_VERSION;
+
+	test = table_tuple_lock(relation, tid, snapshot, slot, cid, lockmode,
+							waitPolicy, lockflags, &tmfd);
+
+	switch (test)
+	{
+		case TM_WouldBlock:
+			/* couldn't lock tuple in SKIP LOCKED mode */
+			return false;
+
+		case TM_SelfModified:
+			/*
+			 * The target tuple was already updated or deleted by the
+			 * current command, or by a later command in the current
+			 * transaction.  We *must* ignore the tuple in the former
+			 * case, so as to avoid the "Halloween problem" of repeated
+			 * update attempts.  In the latter case it might be sensible
+			 * to fetch the updated tuple instead, but doing so would
+			 * require changing heap_update and heap_delete to not
+			 * complain about updating "invisible" tuples, which seems
+			 * pretty scary (table_tuple_lock will not complain, but few
+			 * callers expect TM_Invisible, and we're not one of them). So
+			 * for now, treat the tuple as deleted and do not process.
+			 */
+			return false;
+
+		case TM_Ok:
+			/*
+			 * Got the lock successfully, the locked tuple saved in
+			 * slot for EvalPlanQual, if asked by the caller.
+			 */
+			if (tmfd.traversed && epq_needed)
+				*epq_needed = true;
+			break;
+
+		case TM_Updated:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			elog(ERROR, "unexpected table_tuple_lock status: %u",
+				 test);
+			break;
+
+		case TM_Deleted:
+			if (IsolationUsesXactSnapshot())
+				ereport(ERROR,
+						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+						 errmsg("could not serialize access due to concurrent update")));
+			/* tuple was deleted so don't return it */
+			return false;
+
+		case TM_Invisible:
+			elog(ERROR, "attempted to lock invisible tuple");
+			return false;
+
+		default:
+			elog(ERROR, "unrecognized table_tuple_lock status: %u", test);
+			return false;
+	}
+
+	return true;
+}
+
 /* ----------------------------------------------------------------
  *		ExecInitLockRows
  *
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 01d4c22cfc..3ac7efdfb4 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -23,22 +23,27 @@
 
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/htup_details.h"
+#include "access/skey.h"
 #include "access/sysattr.h"
 #include "access/table.h"
 #include "access/tableam.h"
 #include "access/xact.h"
+#include "catalog/partition.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
 #include "commands/trigger.h"
+#include "executor/execPartition.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
 #include "lib/ilist.h"
 #include "miscadmin.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_relation.h"
+#include "partitioning/partdesc.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -48,6 +53,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 #include "utils/ruleutils.h"
@@ -68,19 +74,15 @@
 #define RI_KEYS_NONE_NULL				2
 
 /* RI query type codes */
-/* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
-#define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
 /* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_ONDELETE		3
-#define RI_PLAN_CASCADE_ONUPDATE		4
+#define RI_PLAN_CASCADE_ONDELETE		1
+#define RI_PLAN_CASCADE_ONUPDATE		2
 /* For RESTRICT, the same plan can be used for both ON DELETE and ON UPDATE triggers. */
-#define RI_PLAN_RESTRICT				5
-#define RI_PLAN_SETNULL_ONDELETE		6
-#define RI_PLAN_SETNULL_ONUPDATE		7
-#define RI_PLAN_SETDEFAULT_ONDELETE		8
-#define RI_PLAN_SETDEFAULT_ONUPDATE		9
+#define RI_PLAN_RESTRICT				3
+#define RI_PLAN_SETNULL_ONDELETE		4
+#define RI_PLAN_SETNULL_ONUPDATE		5
+#define RI_PLAN_SETDEFAULT_ONDELETE		6
+#define RI_PLAN_SETDEFAULT_ONUPDATE		7
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -229,8 +231,274 @@ static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-							   int queryno, bool partgone) pg_attribute_noreturn();
+							   bool on_fk, bool partgone) pg_attribute_noreturn();
+static Oid get_fkey_unique_index(Oid conoid);
 
+/*
+ * Checks whether a tuple containing the unique key as extracted from the
+ * tuple provided in 'slot' exists in 'pk_rel'.  The key is extracted using the
+ * constraint's index given in 'riinfo', which is also scanned to check the
+ * existence of the key.
+ *
+ * If 'pk_rel' is a partitioned table, the check is performed on its leaf
+ * partition that would contain the key.
+ *
+ * The provided tuple is either the one being inserted into the referencing
+ * relation ('fk_rel' is non-NULL), or the one being deleted from the
+ * referenced relation, that is, 'pk_rel' ('fk_rel' is NULL).
+ */
+static bool
+ri_ReferencedKeyExists(Relation pk_rel, Relation fk_rel,
+					   TupleTableSlot *slot,
+					   const RI_ConstraintInfo *riinfo)
+{
+	Oid			constr_id = riinfo->constraint_id;
+	Oid			idxoid;
+	Relation	idxrel;
+	Relation	leaf_pk_rel = NULL;
+	int			num_pk;
+	int			i;
+	bool		found = false;
+	const Oid  *eq_oprs;
+	Datum		pk_vals[INDEX_MAX_KEYS];
+	char		pk_nulls[INDEX_MAX_KEYS];
+	ScanKeyData skey[INDEX_MAX_KEYS];
+	Snapshot	snap = InvalidSnapshot;
+	bool		pushed_latest_snapshot = false;
+	IndexScanDesc	scan;
+	TupleTableSlot *outslot;
+	Oid			saved_userid;
+	int			saved_sec_context;
+	AclResult	aclresult;
+
+	/*
+	 * Extract the unique key from the provided slot and choose the equality
+	 * operators to use when scanning the index below.
+	 */
+	if (fk_rel)
+	{
+		ri_ExtractValues(fk_rel, slot, riinfo, false, pk_vals, pk_nulls);
+		/* Use PK = FK equality operator. */
+		eq_oprs = riinfo->pf_eq_oprs;
+
+		/*
+		 * May need to cast each of the individual values of the foreign key
+		 * to the corresponding PK column's type if the equality operator
+		 * demands it.
+		 */
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			if (pk_nulls[i] != 'n')
+			{
+				Oid		eq_opr = eq_oprs[i];
+				Oid		typeid = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+				RI_CompareHashEntry *entry = ri_HashCompareOp(eq_opr, typeid);
+
+				if (OidIsValid(entry->cast_func_finfo.fn_oid))
+					pk_vals[i] = FunctionCall3(&entry->cast_func_finfo,
+											   pk_vals[i],
+											   Int32GetDatum(-1), /* typmod */
+											   BoolGetDatum(false)); /* implicit coercion */
+			}
+		}
+	}
+	else
+	{
+		ri_ExtractValues(pk_rel, slot, riinfo, true, pk_vals, pk_nulls);
+		/* Use PK = PK equality operator. */
+		eq_oprs = riinfo->pp_eq_oprs;
+	}
+
+	/*
+	 * Switch to referenced table's owner to perform the below operations
+	 * as.  This matches what ri_PerformCheck() does.
+	 *
+	 * Note that as with queries done by ri_PerformCheck(), the way we select
+	 * the referenced row below effectively bypasses any RLS policies that may
+	 * be present on the referenced table.
+	 */
+	GetUserIdAndSecContext(&saved_userid, &saved_sec_context);
+	SetUserIdAndSecContext(RelationGetForm(pk_rel)->relowner,
+						   saved_sec_context | SECURITY_LOCAL_USERID_CHANGE);
+
+	/*
+	 * Also check that the new user has permissions to look into the schema
+	 * of and SELECT from the referenced table.
+	 */
+	aclresult = pg_namespace_aclcheck(RelationGetNamespace(pk_rel),
+									  GetUserId(), ACL_USAGE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_SCHEMA,
+					   get_namespace_name(RelationGetNamespace(pk_rel)));
+	aclresult = pg_class_aclcheck(RelationGetRelid(pk_rel), GetUserId(),
+								  ACL_SELECT);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, OBJECT_TABLE,
+					   RelationGetRelationName(pk_rel));
+
+	/*
+	 * In the case of scanning the PK index for ri_Check_Pk_Match(), we'd like
+	 * to see all rows that could be interesting, even those that would not be
+	 * visible to the transaction snapshot.  To do so, force-push the latest
+	 * snapshot.
+	 *
+	 * Also, increment the command counter to make the changes of the current
+	 * command visible in all cases.
+	 */
+	CommandCounterIncrement();
+	if (fk_rel == NULL)
+	{
+		snap = GetLatestSnapshot();
+		PushActiveSnapshot(snap);
+		pushed_latest_snapshot = true;
+	}
+	else
+	{
+		snap = GetTransactionSnapshot();
+		PushActiveSnapshot(snap);
+	}
+
+	/*
+	 * Open the constraint index to be scanned.
+	 *
+	 * If the target table is partitioned, we must look up the leaf partition
+	 * and its corresponding unique index to search the keys in.
+	 */
+	idxoid = get_fkey_unique_index(constr_id);
+	if (pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Oid		leaf_idxoid;
+		Snapshot mysnap = InvalidSnapshot;
+
+		/*
+		 * XXX the partition descriptor machinery has a hack that assumes that
+		 * the queries originating in this module push the latest snapshot in
+		 * the transaction-snapshot mode.  If we haven't push one already, do
+		 * so here.
+		 */
+		if (!pushed_latest_snapshot)
+		{
+			mysnap = GetLatestSnapshot();
+			PushActiveSnapshot(mysnap);
+		}
+
+		leaf_pk_rel = ExecGetLeafPartitionForKey(pk_rel, riinfo->nkeys,
+												 riinfo->pk_attnums,
+												 pk_vals, pk_nulls,
+												 idxoid, RowShareLock,
+												 &leaf_idxoid);
+		/*
+		 * XXX done fiddling with the partition descriptor machinery so unset
+		 * the active snapshot if we must.
+		 */
+		if (mysnap != InvalidSnapshot)
+			PopActiveSnapshot();
+
+		/*
+		 * If no suitable leaf partition exists, neither can the key we're
+		 * looking for.
+		 */
+		if (leaf_pk_rel == NULL)
+		{
+			SetUserIdAndSecContext(saved_userid, saved_sec_context);
+			PopActiveSnapshot();
+			return false;
+		}
+
+		pk_rel = leaf_pk_rel;
+		idxoid = leaf_idxoid;
+	}
+	idxrel = index_open(idxoid, RowShareLock);
+
+	/* Set up ScanKeys for the index scan. */
+	num_pk = IndexRelationGetNumberOfKeyAttributes(idxrel);
+	for (i = 0; i < num_pk; i++)
+	{
+		int			pkattno = i + 1;
+		Oid			operator = eq_oprs[i];
+		Oid			opfamily = idxrel->rd_opfamily[i];
+		StrategyNumber strat = get_op_opfamily_strategy(operator, opfamily);
+		RegProcedure regop = get_opcode(operator);
+
+		/* Initialize the scankey. */
+		ScanKeyInit(&skey[i],
+					pkattno,
+					strat,
+					regop,
+					pk_vals[i]);
+
+		skey[i].sk_collation = idxrel->rd_indcollation[i];
+
+		/*
+		 * Check for null value.  Should not occur, because callers currently
+		 * take care of the cases in which they do occur.
+		 */
+		if (pk_nulls[i] == 'n')
+			skey[i].sk_flags |= SK_ISNULL;
+	}
+
+	scan = index_beginscan(pk_rel, idxrel, snap, num_pk, 0);
+	index_rescan(scan, skey, num_pk, NULL, 0);
+
+	/* Look for the tuple, and if found, try to lock it in key share mode. */
+	outslot = table_slot_create(pk_rel, NULL);
+	if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+	{
+		/*
+		 * If we fail to lock the tuple for whatever reason, assume it doesn't
+		 * exist.
+		 */
+		found = ExecLockTableTuple(pk_rel, &(outslot->tts_tid), outslot,
+								   snap,
+								   GetCurrentCommandId(false),
+								   LockTupleKeyShare,
+								   LockWaitBlock, NULL);
+	}
+
+	index_endscan(scan);
+	ExecDropSingleTupleTableSlot(outslot);
+
+	/* Don't release lock until commit. */
+	index_close(idxrel, NoLock);
+
+	/* Close leaf partition relation if any. */
+	if (leaf_pk_rel)
+		table_close(leaf_pk_rel, NoLock);
+
+	/* Restore UID and security context */
+	SetUserIdAndSecContext(saved_userid, saved_sec_context);
+
+	PopActiveSnapshot();
+
+	return found;
+}
+
+/*
+ * get_fkey_unique_index
+ * 		Returns the unique index used by a supposedly foreign key constraint
+ */
+static Oid
+get_fkey_unique_index(Oid conoid)
+{
+	Oid			result = InvalidOid;
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(CONSTROID, ObjectIdGetDatum(conoid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_constraint contup = (Form_pg_constraint) GETSTRUCT(tp);
+
+		if (contup->contype == CONSTRAINT_FOREIGN)
+			result = contup->conindid;
+		ReleaseSysCache(tp);
+	}
+
+	if (!OidIsValid(result))
+		elog(ERROR, "unique index not found for foreign key constraint %u",
+			 conoid);
+
+	return result;
+}
 
 /*
  * RI_FKey_check -
@@ -244,8 +512,6 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	TupleTableSlot *newslot;
-	RI_QueryKey qkey;
-	SPIPlanPtr	qplan;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -325,9 +591,9 @@ RI_FKey_check(TriggerData *trigdata)
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying
+					 * ri_ReferencedKeyExists() to only include non-null
+					 * columns.
 					 */
 					break;
 #endif
@@ -342,74 +608,12 @@ RI_FKey_check(TriggerData *trigdata)
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 *
-	 * XXX detectNewRows must be true when a partitioned table is on the
-	 * referenced side.  The reason is that our snapshot must be fresh in
-	 * order for the hack in find_inheritance_children() to work.
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE,
-					SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
+	if (!ri_ReferencedKeyExists(pk_rel, fk_rel, newslot, riinfo))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   true, false);
 
 	table_close(pk_rel, RowShareLock);
 
@@ -464,81 +668,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  TupleTableSlot *oldslot,
 				  const RI_ConstraintInfo *riinfo)
 {
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
 	/* Only called for non-null rows */
 	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
+	return ri_ReferencedKeyExists(pk_rel, NULL, oldslot, riinfo);
 }
 
 
@@ -1608,15 +1741,10 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 					 errtableconstraint(fk_rel,
 										NameStr(fake_riinfo.conname))));
 
-		/*
-		 * We tell ri_ReportViolation we were doing the RI_PLAN_CHECK_LOOKUPPK
-		 * query, which isn't true, but will cause it to use
-		 * fake_riinfo.fk_attnums as we need.
-		 */
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   true, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1833,7 +1961,7 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 			fake_riinfo.pk_attnums[i] = i + 1;
 
 		ri_ReportViolation(&fake_riinfo, pk_rel, fk_rel,
-						   slot, tupdesc, 0, true);
+						   slot, tupdesc, true, true);
 	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
@@ -1970,26 +2098,25 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 {
 	/*
 	 * Inherited constraints with a common ancestor can share ri_query_cache
-	 * entries for all query types except RI_PLAN_CHECK_LOOKUPPK_FROM_PK.
-	 * Except in that case, the query processes the other table involved in
-	 * the FK constraint (i.e., not the table on which the trigger has been
-	 * fired), and so it will be the same for all members of the inheritance
-	 * tree.  So we may use the root constraint's OID in the hash key, rather
-	 * than the constraint's own OID.  This avoids creating duplicate SPI
-	 * plans, saving lots of work and memory when there are many partitions
-	 * with similar FK constraints.
+	 * entries, because each query processes the other table involved in the
+	 * FK constraint (i.e., not the table on which the trigger has been fired),
+	 * and so it will be the same for all members of the inheritance tree.  So
+	 * we may use the root constraint's OID in the hash key, rather than the
+	 * constraint's own OID.  This avoids creating duplicate SPI plans, saving
+	 * lots of work and memory when there are many partitions with similar FK
+	 * constraints.
 	 *
 	 * (Note that we must still have a separate RI_ConstraintInfo for each
 	 * constraint, because partitions can have different column orders,
 	 * resulting in different pk_attnums[] or fk_attnums[] array contents.)
 	 *
+	 * (Note also that for a standalone or non-inherited constraint,
+	 * constraint_root_id is same as constraint_id.)
+	 *
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
 	 * to use memset to clear them.
 	 */
-	if (constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK)
-		key->constr_id = riinfo->constraint_root_id;
-	else
-		key->constr_id = riinfo->constraint_id;
+	key->constr_id = riinfo->constraint_root_id;
 	key->constr_queryno = constr_queryno;
 }
 
@@ -2260,19 +2387,11 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
-	Relation	query_rel;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel;
 	Oid			save_userid;
 	int			save_sec_context;
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
 	/* Switch to proper UID to perform check as */
 	GetUserIdAndSecContext(&save_userid, &save_sec_context);
 	SetUserIdAndSecContext(RelationGetForm(query_rel)->relowner,
@@ -2305,9 +2424,9 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	/* There are currently no queries that run on PK table. */
+	Relation	query_rel = fk_rel,
+				source_rel = pk_rel;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
@@ -2317,46 +2436,17 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	Datum		vals[RI_MAX_NUMKEYS * 2];
 	char		nulls[RI_MAX_NUMKEYS * 2];
 
-	/*
-	 * Use the query type code to determine whether the query is run against
-	 * the PK or FK table; we'll do the check as that table's owner
-	 */
-	if (qkey->constr_queryno <= RI_PLAN_LAST_ON_PK)
-		query_rel = pk_rel;
-	else
-		query_rel = fk_rel;
-
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
 	/* Extract the parameters to be passed into the query */
 	if (newslot)
 	{
-		ri_ExtractValues(source_rel, newslot, riinfo, source_is_pk,
-						 vals, nulls);
+		ri_ExtractValues(source_rel, newslot, riinfo, true, vals, nulls);
 		if (oldslot)
-			ri_ExtractValues(source_rel, oldslot, riinfo, source_is_pk,
+			ri_ExtractValues(source_rel, oldslot, riinfo, true,
 							 vals + riinfo->nkeys, nulls + riinfo->nkeys);
 	}
 	else
 	{
-		ri_ExtractValues(source_rel, oldslot, riinfo, source_is_pk,
-						 vals, nulls);
+		ri_ExtractValues(source_rel, oldslot, riinfo, true, vals, nulls);
 	}
 
 	/*
@@ -2420,14 +2510,12 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
 	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
+	if (expect_OK == SPI_OK_SELECT && SPI_processed != 0)
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
 						   newslot ? newslot : oldslot,
 						   NULL,
-						   qkey->constr_queryno, false);
+						   false, false);
 
 	return SPI_processed != 0;
 }
@@ -2458,9 +2546,9 @@ ri_ExtractValues(Relation rel, TupleTableSlot *slot,
 /*
  * Produce an error report
  *
- * If the failed constraint was on insert/update to the FK table,
- * we want the key names and values extracted from there, and the error
- * message to look like 'key blah is not present in PK'.
+ * If the failed constraint was on insert/update to the FK table (on_fk is
+ * true), we want the key names and values extracted from there, and the
+ * error message to look like 'key blah is not present in PK'.
  * Otherwise, the attr names and values come from the PK table and the
  * message looks like 'key blah is still referenced from FK'.
  */
@@ -2468,22 +2556,20 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 				   Relation pk_rel, Relation fk_rel,
 				   TupleTableSlot *violatorslot, TupleDesc tupdesc,
-				   int queryno, bool partgone)
+				   bool on_fk, bool partgone)
 {
 	StringInfoData key_names;
 	StringInfoData key_values;
-	bool		onfk;
 	const int16 *attnums;
 	Oid			rel_oid;
 	AclResult	aclresult;
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * If tupdesc wasn't passed by caller, assume the violator tuple came from
+	 * there.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
-	if (onfk)
+	if (on_fk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
@@ -2585,7 +2671,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 						   key_names.data, key_values.data,
 						   RelationGetRelationName(fk_rel)),
 				 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
-	else if (onfk)
+	else if (on_fk)
 		ereport(ERROR,
 				(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 				 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -2892,7 +2978,10 @@ ri_AttributesEqual(Oid eq_opr, Oid typeid,
  * ri_HashCompareOp -
  *
  * See if we know how to compare two values, and create a new hash entry
- * if not.
+ * if not.  The entry contains the FmgrInfo of the equality operator function
+ * and that of the cast function, if one is needed to convert the right
+ * operand (whose type OID has been passed) before passing it to the equality
+ * function.
  */
 static RI_CompareHashEntry *
 ri_HashCompareOp(Oid eq_opr, Oid typeid)
@@ -2948,8 +3037,16 @@ ri_HashCompareOp(Oid eq_opr, Oid typeid)
 		 * moment since that will never be generated for implicit coercions.
 		 */
 		op_input_types(eq_opr, &lefttype, &righttype);
-		Assert(lefttype == righttype);
-		if (typeid == lefttype)
+
+		/*
+		 * Don't need to cast if the values that will be passed to the
+		 * operator will be of expected operand type(s).  The operator can be
+		 * cross-type (such as when called by ri_ReferencedKeyExists()), in
+		 * which case, we only need the cast if the right operand value
+		 * doesn't match the type expected by the operator.
+		 */
+		if ((lefttype == righttype && typeid == lefttype) ||
+			(lefttype != righttype && typeid == righttype))
 			castfunc = InvalidOid;	/* simplest case */
 		else
 		{
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..6a69113325 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -31,7 +31,12 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
 										EState *estate);
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 									PartitionTupleRouting *proute);
-
+extern Relation ExecGetLeafPartitionForKey(Relation root_rel,
+										   int key_natts,
+										   const AttrNumber *key_attnums,
+										   Datum *key_vals, char *key_nulls,
+										   Oid root_idxoid, int lockmode,
+										   Oid *leaf_idxoid);
 
 /*
  * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 873772f188..a69782279b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -243,6 +243,15 @@ extern bool ExecShutdownNode(PlanState *node);
 extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
 
 
+/*
+ * functions in execLockRows.c
+ */
+
+extern bool ExecLockTableTuple(Relation relation, ItemPointer tid, TupleTableSlot *slot,
+				   Snapshot snapshot, CommandId cid,
+				   LockTupleMode lockmode, LockWaitPolicy waitPolicy,
+				   bool *epq_needed);
+
 /* ----------------------------------------------------------------
  *		ExecProcNode
  *
diff --git a/src/test/isolation/expected/fk-snapshot.out b/src/test/isolation/expected/fk-snapshot.out
index 5faf80d6ce..22752cc742 100644
--- a/src/test/isolation/expected/fk-snapshot.out
+++ b/src/test/isolation/expected/fk-snapshot.out
@@ -47,12 +47,12 @@ a
 
 step s2ifn2: INSERT INTO fk_noparted VALUES (2);
 step s2c: COMMIT;
+ERROR:  insert or update on table "fk_noparted" violates foreign key constraint "fk_noparted_a_fkey"
 step s2sfn: SELECT * FROM fk_noparted;
 a
 -
 1
-2
-(2 rows)
+(1 row)
 
 
 starting permutation: s1brc s2brc s2ip2 s1sp s2c s1sp s1ifp2 s2brc s2sfp s1c s1sfp s2ifn2 s2c s2sfn
diff --git a/src/test/isolation/specs/fk-snapshot.spec b/src/test/isolation/specs/fk-snapshot.spec
index 378507fbc3..64d27f29c3 100644
--- a/src/test/isolation/specs/fk-snapshot.spec
+++ b/src/test/isolation/specs/fk-snapshot.spec
@@ -46,10 +46,7 @@ step s2sfn	{ SELECT * FROM fk_noparted; }
 # inserting into referencing tables in transaction-snapshot mode
 # PK table is non-partitioned
 permutation s1brr s2brc s2ip2 s1sp s2c s1sp s1ifp2 s1c s1sfp
-# PK table is partitioned: buggy, because s2's serialization transaction can
-# see the uncommitted row thanks to the latest snapshot taken for
-# partition lookup to work correctly also ends up getting used by the PK index
-# scan
+# PK table is partitioned
 permutation s2ip2 s2brr s1brc s1ifp2 s2sfp s1c s2sfp s2ifn2 s2c s2sfn
 
 # inserting into referencing tables in up-to-date snapshot mode
-- 
2.24.1

#66

Amit Langote

amitlangote09@gmail.com

almost 4 years ago

In reply to: Amit Langote (#65)

Re: simplifying foreign key/RI checks

On Thu, Apr 7, 2022 at 10:05 AM Amit Langote <amitlangote09@gmail.com> wrote:

There were rebase conflicts with the recently committed
execPartition.c/h changes. While fixing them, I thought maybe
find_leaf_part_for_key() doesn't quite match in style with its
neighbors in execPartition.h, so changed it to
ExecGetLeafPartitionForKey().

This one has been marked Returned with Feedback in the CF app, which
makes sense given the discussion on -committers [1]/messages/by-id/E1ncXX2-000mFt-Pe@gemulon.postgresql.org.

Agree with the feedback given that it would be better to address *all*
RI trigger check/action functions in the project of sidestepping SPI
when doing those checks/actions, not only RI_FKey_check_ins / upd() as
the current patch does. I guess that will require thinking a little
bit harder about how to modularize the new implementation so that the
various trigger functions don't end up with their own bespoke
check/action implementations.

I'll think about that, also consider what Corey proposed in [2]/messages/by-id/CADkLM=eZJddpx6RDop-oCrQ+J9R-wfbf6MoLxUUGjbpwTkoUXQ@mail.gmail.com, and
try to reformulate this for v16.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1]: /messages/by-id/E1ncXX2-000mFt-Pe@gemulon.postgresql.org
[2]: /messages/by-id/CADkLM=eZJddpx6RDop-oCrQ+J9R-wfbf6MoLxUUGjbpwTkoUXQ@mail.gmail.com

#67

Amit Langote

amitlangote09@gmail.com

over 3 years ago

In reply to: Amit Langote (#66)

Re: simplifying foreign key/RI checks

On Mon, Apr 11, 2022 at 4:47 PM Amit Langote <amitlangote09@gmail.com> wrote:

This one has been marked Returned with Feedback in the CF app, which
makes sense given the discussion on -committers [1].

Agree with the feedback given that it would be better to address *all*
RI trigger check/action functions in the project of sidestepping SPI
when doing those checks/actions, not only RI_FKey_check_ins / upd() as
the current patch does. I guess that will require thinking a little
bit harder about how to modularize the new implementation so that the
various trigger functions don't end up with their own bespoke
check/action implementations.

I'll think about that, also consider what Corey proposed in [2], and
try to reformulate this for v16.

I've been thinking about this and wondering if the SPI overhead is too
big in the other cases (cases where it is the FK table that is to be
scanned) that it makes sense to replace the actual planner (invoked
via SPI) by a hard-coded mini-planner for the task of figuring out the
best way to scan the FK table for a given PK row affected by the main
query. Planner's involvement seems necessary in those cases, because
the choice of how to scan the FK table is not as clear cut as how to
scan the PK table.

ISTM, the SPI overhead consists mainly of performing GetCachedPlan()
and executor setup/shutdown, which can seem substantial when compared
to the core task of scanning the PK/FK table, and does add up over
many rows affected by the main query, as seen by the over 2x speedup
for the PK table case gained by shaving it off with the proposed patch
[1]: drop table pk, fk; create table pk (a int primary key); create table fk (a int references pk); insert into pk select generate_series(1, 1000000); insert into fk select i%1000000+1 from generate_series(1, 10000000) i;
its own, even though maybe not as many as by the use of SPI, so the
speedup might be less impressive.

Other than coming up with an acceptable implementation for the
mini-planner (maybe we have an example in plan_cluster_use_sort() to
ape), one more challenge is to figure out a way to implement the
CASCADE/SET trigger routines. For those, we might need to introduce
restricted forms of ExecUpdate(), ExecDelete() that can be called
directly, that is, without a full-fledged plan. Not having to worry
about those things does seem like a benefit of just continuing to use
the SPI in those cases.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1]: drop table pk, fk; create table pk (a int primary key); create table fk (a int references pk); insert into pk select generate_series(1, 1000000); insert into fk select i%1000000+1 from generate_series(1, 10000000) i;
drop table pk, fk;
create table pk (a int primary key);
create table fk (a int references pk);
insert into pk select generate_series(1, 1000000);
insert into fk select i%1000000+1 from generate_series(1, 10000000) i;

Time for the last statement:

HEAD: 67566.845 ms (01:07.567)

Patched: 26759.627 ms (00:26.760)