More efficient RI checks - take 2

Started by Antonin Houskaalmost 6 years ago35 messages
#1Antonin Houska
ah@cybertec.at
4 attachment(s)

After having reviewed [1]https://commitfest.postgresql.org/22/1975/ more than a year ago (the problem I found was that
the transient table is not available for deferred constraints), I've tried to
implement the same in an alternative way. The RI triggers still work as row
level triggers, but if multiple events of the same kind appear in the queue,
they are all passed to the trigger function at once. Thus the check query does
not have to be executed that frequently.

Some performance comparisons are below. (Besides the execution time, please
note the difference in the number of trigger function executions.) In general,
the checks are significantly faster if there are many rows to process, and a
bit slower when we only need to check a single row. However I'm not sure about
the accuracy if only a single row is measured (if a single row check is
performed several times, the execution time appears to fluctuate).

Comments are welcome.

Setup
=====

CREATE TABLE p(i int primary key);
INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x);
CREATE TABLE f(i int REFERENCES p);

Insert many rows into the FK table
==================================

master:

EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Insert on f (cost=0.00..163.84 rows=16384 width=4) (actual time=32.741..32.741 rows=0 loops=1)
-> Function Scan on generate_series g (cost=0.00..163.84 rows=16384 width=4) (actual time=2.403..4.802 rows=16384 loops=1)
Planning Time: 0.050 ms
Trigger for constraint f_i_fkey: time=448.986 calls=16384
Execution Time: 485.444 ms
(5 rows)

patched:

EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Insert on f (cost=0.00..163.84 rows=16384 width=4) (actual time=34.053..34.053 rows=0 loops=1)
-> Function Scan on generate_series g (cost=0.00..163.84 rows=16384 width=4) (actual time=2.223..4.448 rows=16384 loops=1)
Planning Time: 0.047 ms
Trigger for constraint f_i_fkey: time=105.164 calls=8
Execution Time: 141.201 ms

Insert a single row into the FK table
=====================================

master:

EXPLAIN ANALYZE INSERT INTO f VALUES (1);
QUERY PLAN
------------------------------------------------------------------------------------------
Insert on f (cost=0.00..0.01 rows=1 width=4) (actual time=0.060..0.060 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 0.026 ms
Trigger for constraint f_i_fkey: time=0.435 calls=1
Execution Time: 0.517 ms
(5 rows)

patched:

EXPLAIN ANALYZE INSERT INTO f VALUES (1);
QUERY PLAN
------------------------------------------------------------------------------------------
Insert on f (cost=0.00..0.01 rows=1 width=4) (actual time=0.066..0.066 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 0.025 ms
Trigger for constraint f_i_fkey: time=0.578 calls=1
Execution Time: 0.670 ms

Check if FK row exists during deletion from the PK
==================================================

master:

DELETE FROM p WHERE i=16384;
ERROR: update or delete on table "p" violates foreign key constraint "f_i_fkey" on table "f"
DETAIL: Key (i)=(16384) is still referenced from table "f".
Time: 3.381 ms

patched:

DELETE FROM p WHERE i=16384;
ERROR: update or delete on table "p" violates foreign key constraint "f_i_fkey" on table "f"
DETAIL: Key (i)=(16384) is still referenced from table "f".
Time: 5.561 ms

Cascaded DELETE --- many PK rows
================================

DROP TABLE f;
CREATE TABLE f(i int REFERENCES p ON UPDATE CASCADE ON DELETE CASCADE);
INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);

master:

EXPLAIN ANALYZE DELETE FROM p;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Delete on p (cost=0.00..236.84 rows=16384 width=6) (actual time=38.334..38.334 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..236.84 rows=16384 width=6) (actual time=0.019..3.925 rows=16384 loops=1)
Planning Time: 0.049 ms
Trigger for constraint f_i_fkey: time=31348.756 calls=16384
Execution Time: 31390.784 ms

patched:

EXPLAIN ANALYZE DELETE FROM p;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Delete on p (cost=0.00..236.84 rows=16384 width=6) (actual time=33.360..33.360 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..236.84 rows=16384 width=6) (actual time=0.012..3.183 rows=16384 loops=1)
Planning Time: 0.094 ms
Trigger for constraint f_i_fkey: time=9.580 calls=8
Execution Time: 43.941 ms

Cascaded DELETE --- a single PK row
===================================

INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x);
INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);

master:

DELETE FROM p WHERE i=16384;
DELETE 1
Time: 5.754 ms

patched:

DELETE FROM p WHERE i=16384;
DELETE 1
Time: 8.098 ms

Cascaded UPDATE - many rows
===========================

master:

EXPLAIN ANALYZE UPDATE p SET i = i + 16384;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Update on p (cost=0.00..277.80 rows=16384 width=10) (actual time=166.954..166.954 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..277.80 rows=16384 width=10) (actual time=0.013..7.780 rows=16384 loops=1)
Planning Time: 0.177 ms
Trigger for constraint f_i_fkey on p: time=60405.362 calls=16384
Trigger for constraint f_i_fkey on f: time=455.874 calls=16384
Execution Time: 61036.996 ms

patched:

EXPLAIN ANALYZE UPDATE p SET i = i + 16384;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Update on p (cost=0.00..277.77 rows=16382 width=10) (actual time=159.512..159.512 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..277.77 rows=16382 width=10) (actual time=0.014..7.783 rows=16382 loops=1)
Planning Time: 0.146 ms
Trigger for constraint f_i_fkey on p: time=169.628 calls=9
Trigger for constraint f_i_fkey on f: time=124.079 calls=2
Execution Time: 456.072 ms

Cascaded UPDATE - a single row
==============================

master:

UPDATE p SET i = i - 16384 WHERE i=32767;
UPDATE 1
Time: 4.858 ms

patched:

UPDATE p SET i = i - 16384 WHERE i=32767;
UPDATE 1
Time: 11.955 ms

[1]: https://commitfest.postgresql.org/22/1975/

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachments:

v01-0002-Changed-ri_GenerateQual-so-it-generates-the-whole-qu.patchtext/x-diffDownload
From 58926a4546b3918af8f6e6691956731d8c902701 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Wed, 8 Apr 2020 15:03:20 +0200
Subject: [PATCH 2/4] Changed ri_GenerateQual() so it generates the whole
 qualifier.

This way we can use the function to reduce the amount of copy&pasted code a
bit.
---
 src/backend/utils/adt/ri_triggers.c | 288 +++++++++++++++-------------
 1 file changed, 159 insertions(+), 129 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6220872126..3bedb75846 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -180,11 +180,17 @@ static Datum ri_restrict(TriggerData *trigdata, bool is_no_action);
 static Datum ri_set(TriggerData *trigdata, bool is_set_null);
 static void quoteOneName(char *buffer, const char *name);
 static void quoteRelationName(char *buffer, Relation rel);
-static void ri_GenerateQual(StringInfo buf,
-							const char *sep,
-							const char *leftop, Oid leftoptype,
-							Oid opoid,
-							const char *rightop, Oid rightoptype);
+static char *ri_ColNameQuoted(const char *tabname, const char *attname);
+static void ri_GenerateQual(StringInfo buf, char *sep, int nkeys,
+							const char *ltabname, Relation lrel,
+							const int16 *lattnums,
+							const char *rtabname, Relation rrel,
+							const int16 *rattnums, const Oid *eq_oprs);
+static void ri_GenerateQualComponent(StringInfo buf,
+									 const char *sep,
+									 const char *leftop, Oid leftoptype,
+									 Oid opoid,
+									 const char *rightop, Oid rightoptype);
 static void ri_GenerateQualCollation(StringInfo buf, Oid collation);
 static int	ri_NullCheck(TupleDesc tupdesc, TupleTableSlot *slot,
 						 const RI_ConstraintInfo *riinfo, bool rel_is_pk);
@@ -372,10 +378,10 @@ RI_FKey_check(TriggerData *trigdata)
 			quoteOneName(attname,
 						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
 			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
+			ri_GenerateQualComponent(&querybuf, querysep,
+									 attname, pk_type,
+									 riinfo->pf_eq_oprs[i],
+									 paramname, fk_type);
 			querysep = "AND";
 			queryoids[i] = fk_type;
 		}
@@ -504,10 +510,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 			quoteOneName(attname,
 						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
 			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
+			ri_GenerateQualComponent(&querybuf, querysep,
+									 attname, pk_type,
+									 riinfo->pp_eq_oprs[i],
+									 paramname, pk_type);
 			querysep = "AND";
 			queryoids[i] = pk_type;
 		}
@@ -694,10 +700,10 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 			quoteOneName(attname,
 						 RIAttName(fk_rel, riinfo->fk_attnums[i]));
 			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							paramname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							attname, fk_type);
+			ri_GenerateQualComponent(&querybuf, querysep,
+									 paramname, pk_type,
+									 riinfo->pf_eq_oprs[i],
+									 attname, fk_type);
 			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
 				ri_GenerateQualCollation(&querybuf, pk_coll);
 			querysep = "AND";
@@ -805,10 +811,10 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 			quoteOneName(attname,
 						 RIAttName(fk_rel, riinfo->fk_attnums[i]));
 			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							paramname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							attname, fk_type);
+			ri_GenerateQualComponent(&querybuf, querysep,
+									 paramname, pk_type,
+									 riinfo->pf_eq_oprs[i],
+									 attname, fk_type);
 			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
 				ri_GenerateQualCollation(&querybuf, pk_coll);
 			querysep = "AND";
@@ -924,10 +930,10 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 							 "%s %s = $%d",
 							 querysep, attname, i + 1);
 			sprintf(paramname, "$%d", j + 1);
-			ri_GenerateQual(&qualbuf, qualsep,
-							paramname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							attname, fk_type);
+			ri_GenerateQualComponent(&qualbuf, qualsep,
+									 paramname, pk_type,
+									 riinfo->pf_eq_oprs[i],
+									 attname, fk_type);
 			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
 				ri_GenerateQualCollation(&querybuf, pk_coll);
 			querysep = ",";
@@ -1104,10 +1110,10 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 							 querysep, attname,
 							 is_set_null ? "NULL" : "DEFAULT");
 			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&qualbuf, qualsep,
-							paramname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							attname, fk_type);
+			ri_GenerateQualComponent(&qualbuf, qualsep,
+									 paramname, pk_type,
+									 riinfo->pf_eq_oprs[i],
+									 attname, fk_type);
 			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
 				ri_GenerateQualCollation(&querybuf, pk_coll);
 			querysep = ",";
@@ -1402,31 +1408,13 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 		"" : "ONLY ";
 	appendStringInfo(&querybuf,
-					 " FROM %s%s fk LEFT OUTER JOIN %s%s pk ON",
+					 " FROM %s%s fk LEFT OUTER JOIN %s%s pk ON (",
 					 fk_only, fkrelname, pk_only, pkrelname);
 
-	strcpy(pkattname, "pk.");
-	strcpy(fkattname, "fk.");
-	sep = "(";
-	for (int i = 0; i < riinfo->nkeys; i++)
-	{
-		Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-		Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-		Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
-		Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
-
-		quoteOneName(pkattname + 3,
-					 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-		quoteOneName(fkattname + 3,
-					 RIAttName(fk_rel, riinfo->fk_attnums[i]));
-		ri_GenerateQual(&querybuf, sep,
-						pkattname, pk_type,
-						riinfo->pf_eq_oprs[i],
-						fkattname, fk_type);
-		if (pk_coll != fk_coll)
-			ri_GenerateQualCollation(&querybuf, pk_coll);
-		sep = "AND";
-	}
+	ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
+					"pk", pk_rel, riinfo->pk_attnums,
+					"fk", fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs);
 
 	/*
 	 * It's sufficient to test any one pk attribute for null to detect a join
@@ -1584,7 +1572,6 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	char	   *constraintDef;
 	char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
 	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-	char		pkattname[MAX_QUOTED_NAME_LEN + 3];
 	char		fkattname[MAX_QUOTED_NAME_LEN + 3];
 	const char *sep;
 	const char *fk_only;
@@ -1633,30 +1620,13 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 		"" : "ONLY ";
 	appendStringInfo(&querybuf,
-					 " FROM %s%s fk JOIN %s pk ON",
+					 " FROM %s%s fk JOIN %s pk ON (",
 					 fk_only, fkrelname, pkrelname);
-	strcpy(pkattname, "pk.");
-	strcpy(fkattname, "fk.");
-	sep = "(";
-	for (i = 0; i < riinfo->nkeys; i++)
-	{
-		Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-		Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-		Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
-		Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
-
-		quoteOneName(pkattname + 3,
-					 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-		quoteOneName(fkattname + 3,
-					 RIAttName(fk_rel, riinfo->fk_attnums[i]));
-		ri_GenerateQual(&querybuf, sep,
-						pkattname, pk_type,
-						riinfo->pf_eq_oprs[i],
-						fkattname, fk_type);
-		if (pk_coll != fk_coll)
-			ri_GenerateQualCollation(&querybuf, pk_coll);
-		sep = "AND";
-	}
+
+	ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
+					"pk", pk_rel, riinfo->pk_attnums,
+					"fk", fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs);
 
 	/*
 	 * Start the WHERE clause with the partition constraint (except if this is
@@ -1820,7 +1790,44 @@ quoteRelationName(char *buffer, Relation rel)
 }
 
 /*
- * ri_GenerateQual --- generate a WHERE clause equating two variables
+ * ri_GenerateQual --- generate WHERE/ON clause.
+ *
+ * Note: to avoid unnecessary explicit casts, make sure that the left and
+ * right operands match eq_oprs expect (ie don't swap the left and right
+ * operands accidentally).
+ */
+static void
+ri_GenerateQual(StringInfo buf, char *sep, int nkeys,
+				const char *ltabname, Relation lrel,
+				const int16 *lattnums,
+				const char *rtabname, Relation rrel,
+				const int16 *rattnums,
+				const Oid *eq_oprs)
+{
+	for (int i = 0; i < nkeys; i++)
+	{
+		Oid			ltype = RIAttType(lrel, lattnums[i]);
+		Oid			rtype = RIAttType(rrel, rattnums[i]);
+		Oid			lcoll = RIAttCollation(lrel, lattnums[i]);
+		Oid			rcoll = RIAttCollation(rrel, rattnums[i]);
+		char	   *latt,
+				   *ratt;
+		char	   *sep_current = i > 0 ? sep : NULL;
+
+		latt = ri_ColNameQuoted(ltabname, RIAttName(lrel, lattnums[i]));
+		ratt = ri_ColNameQuoted(rtabname, RIAttName(rrel, rattnums[i]));
+
+		ri_GenerateQualComponent(buf, sep_current, latt, ltype, eq_oprs[i],
+								 ratt, rtype);
+
+		if (lcoll != rcoll)
+			ri_GenerateQualCollation(buf, lcoll);
+	}
+}
+
+/*
+ * ri_GenerateQual --- generate a component of WHERE/ON clause equating two
+ * variables, to be AND-ed to the other components.
  *
  * This basically appends " sep leftop op rightop" to buf, adding casts
  * and schema qualification as needed to ensure that the parser will select
@@ -1828,17 +1835,86 @@ quoteRelationName(char *buffer, Relation rel)
  * if they aren't variables or parameters.
  */
 static void
-ri_GenerateQual(StringInfo buf,
-				const char *sep,
-				const char *leftop, Oid leftoptype,
-				Oid opoid,
-				const char *rightop, Oid rightoptype)
+ri_GenerateQualComponent(StringInfo buf,
+						 const char *sep,
+						 const char *leftop, Oid leftoptype,
+						 Oid opoid,
+						 const char *rightop, Oid rightoptype)
 {
-	appendStringInfo(buf, " %s ", sep);
+	if (sep)
+		appendStringInfo(buf, " %s ", sep);
 	generate_operator_clause(buf, leftop, leftoptype, opoid,
 							 rightop, rightoptype);
 }
 
+/*
+ * ri_ColNameQuoted() --- return column name, with both table and column name
+ * quoted.
+ */
+static char *
+ri_ColNameQuoted(const char *tabname, const char *attname)
+{
+	char		quoted[MAX_QUOTED_NAME_LEN];
+	StringInfo	result = makeStringInfo();
+
+	if (tabname && strlen(tabname) > 0)
+	{
+		quoteOneName(quoted, tabname);
+		appendStringInfo(result, "%s.", quoted);
+	}
+
+	quoteOneName(quoted, attname);
+	appendStringInfoString(result, quoted);
+
+	return result->data;
+}
+
+/*
+ * Check that RI trigger function was called in expected context
+ */
+static void
+ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname, int tgkind)
+{
+	TriggerData *trigdata = (TriggerData *) fcinfo->context;
+
+	if (!CALLED_AS_TRIGGER(fcinfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+				 errmsg("function \"%s\" was not called by trigger manager", funcname)));
+
+	/*
+	 * Check proper event
+	 */
+	if (!TRIGGER_FIRED_AFTER(trigdata->tg_event) ||
+		!TRIGGER_FIRED_FOR_ROW(trigdata->tg_event))
+		ereport(ERROR,
+				(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+				 errmsg("function \"%s\" must be fired AFTER ROW", funcname)));
+
+	switch (tgkind)
+	{
+		case RI_TRIGTYPE_INSERT:
+			if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event))
+				ereport(ERROR,
+						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+						 errmsg("function \"%s\" must be fired for INSERT", funcname)));
+			break;
+		case RI_TRIGTYPE_UPDATE:
+			if (!TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
+				ereport(ERROR,
+						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+						 errmsg("function \"%s\" must be fired for UPDATE", funcname)));
+			break;
+
+		case RI_TRIGTYPE_DELETE:
+			if (!TRIGGER_FIRED_BY_DELETE(trigdata->tg_event))
+				ereport(ERROR,
+						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+						 errmsg("function \"%s\" must be fired for DELETE", funcname)));
+			break;
+	}
+}
+
 /*
  * ri_GenerateQualCollation --- add a COLLATE spec to a WHERE clause
  *
@@ -1909,52 +1985,6 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	key->constr_queryno = constr_queryno;
 }
 
-/*
- * Check that RI trigger function was called in expected context
- */
-static void
-ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname, int tgkind)
-{
-	TriggerData *trigdata = (TriggerData *) fcinfo->context;
-
-	if (!CALLED_AS_TRIGGER(fcinfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
-				 errmsg("function \"%s\" was not called by trigger manager", funcname)));
-
-	/*
-	 * Check proper event
-	 */
-	if (!TRIGGER_FIRED_AFTER(trigdata->tg_event) ||
-		!TRIGGER_FIRED_FOR_ROW(trigdata->tg_event))
-		ereport(ERROR,
-				(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
-				 errmsg("function \"%s\" must be fired AFTER ROW", funcname)));
-
-	switch (tgkind)
-	{
-		case RI_TRIGTYPE_INSERT:
-			if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event))
-				ereport(ERROR,
-						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
-						 errmsg("function \"%s\" must be fired for INSERT", funcname)));
-			break;
-		case RI_TRIGTYPE_UPDATE:
-			if (!TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
-				ereport(ERROR,
-						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
-						 errmsg("function \"%s\" must be fired for UPDATE", funcname)));
-			break;
-		case RI_TRIGTYPE_DELETE:
-			if (!TRIGGER_FIRED_BY_DELETE(trigdata->tg_event))
-				ereport(ERROR,
-						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
-						 errmsg("function \"%s\" must be fired for DELETE", funcname)));
-			break;
-	}
-}
-
-
 /*
  * Fetch the RI_ConstraintInfo struct for the trigger's FK constraint.
  */
-- 
2.20.1

v01-0003-Return-early-from-ri_NullCheck-if-possible.patchtext/x-diffDownload
From 8046a7dc0782a6ce95e808853a839ad529d76743 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Wed, 8 Apr 2020 15:03:20 +0200
Subject: [PATCH 3/4] Return early from ri_NullCheck() if possible.

---
 src/backend/utils/adt/ri_triggers.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 3bedb75846..93e46ddf7a 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2541,6 +2541,13 @@ ri_NullCheck(TupleDesc tupDesc,
 			nonenull = false;
 		else
 			allnull = false;
+
+		/*
+		 * If seen both NULL and non-NULL, the next attributes cannot change
+		 * the result.
+		 */
+		if (!nonenull && !allnull)
+			return RI_KEYS_SOME_NULL;
 	}
 
 	if (allnull)
@@ -2549,7 +2556,8 @@ ri_NullCheck(TupleDesc tupDesc,
 	if (nonenull)
 		return RI_KEYS_NONE_NULL;
 
-	return RI_KEYS_SOME_NULL;
+	/* Should not happen. */
+	Assert(false);
 }
 
 
-- 
2.20.1

v01-0001-Check-for-RI-violation-outside-ri_PerformCheck.patchtext/x-diffDownload
From 35606a7e4e66f3279f52be941b0b9bce29d73de3 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Wed, 8 Apr 2020 15:03:20 +0200
Subject: [PATCH 1/4] Check for RI violation outside ri_PerformCheck().

---
 src/backend/utils/adt/ri_triggers.c | 40 ++++++++++++++---------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index bb49e80d16..6220872126 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -389,11 +389,16 @@ RI_FKey_check(TriggerData *trigdata)
 	/*
 	 * Now check that foreign key exists in PK table
 	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					false,
-					SPI_OK_SELECT);
+	if (!ri_PerformCheck(riinfo, &qkey, qplan,
+						 fk_rel, pk_rel,
+						 NULL, newslot,
+						 false,
+						 SPI_OK_SELECT))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   qkey.constr_queryno, false);
 
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
@@ -708,11 +713,16 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	/*
 	 * We have a plan now. Run it to check for existing references.
 	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					oldslot, NULL,
-					true,		/* must detect new rows */
-					SPI_OK_SELECT);
+	if (ri_PerformCheck(riinfo, &qkey, qplan,
+						fk_rel, pk_rel,
+						oldslot, NULL,
+						true,	/* must detect new rows */
+						SPI_OK_SELECT))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   oldslot,
+						   NULL,
+						   qkey.constr_queryno, false);
 
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
@@ -2288,16 +2298,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						RelationGetRelationName(fk_rel)),
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
-	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
-		ri_ReportViolation(riinfo,
-						   pk_rel, fk_rel,
-						   newslot ? newslot : oldslot,
-						   NULL,
-						   qkey->constr_queryno, false);
-
 	return SPI_processed != 0;
 }
 
-- 
2.20.1

v01-0004-Process-multiple-RI-trigger-events-at-a-time.patchtext/x-diffDownload
From 08f181a6f411ddc79a1d0ecabc26d430cf83221e Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Wed, 8 Apr 2020 15:03:20 +0200
Subject: [PATCH 4/4] Process multiple RI trigger events at a time.

It should be more efficient to execute the check query once per multiple rows
inserted / updated / deleted than to run the query for every single row again.

Separate storage is used for the RI trigger events because the "transient
table" that we provide to statement triggers would not be available for
deferred constraints. So the RI triggers still work at row level, although the
rows are processed in batches.
---
 src/backend/commands/tablecmds.c    |   53 +-
 src/backend/commands/trigger.c      |  406 +++++++--
 src/backend/executor/spi.c          |   16 +-
 src/backend/utils/adt/ri_triggers.c | 1270 +++++++++++++++++----------
 src/include/commands/trigger.h      |   23 +
 5 files changed, 1200 insertions(+), 568 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c8c88be2c9..feccb93b18 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -10225,6 +10225,11 @@ validateForeignKeyConstraint(char *conname,
 	MemoryContext oldcxt;
 	MemoryContext perTupCxt;
 
+	LOCAL_FCINFO(fcinfo, 0);
+	TriggerData trigdata = {0};
+	ResourceOwner saveResourceOwner;
+	Tuplestorestate *table;
+
 	ereport(DEBUG1,
 			(errmsg("validating foreign key constraint \"%s\"", conname)));
 
@@ -10259,6 +10264,11 @@ validateForeignKeyConstraint(char *conname,
 	slot = table_slot_create(rel, NULL);
 	scan = table_beginscan(rel, snapshot, 0, NULL);
 
+	saveResourceOwner = CurrentResourceOwner;
+	CurrentResourceOwner = CurTransactionResourceOwner;
+	table = tuplestore_begin_heap(false, false, work_mem);
+	CurrentResourceOwner = saveResourceOwner;
+
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
 									  ALLOCSET_SMALL_SIZES);
@@ -10266,34 +10276,33 @@ validateForeignKeyConstraint(char *conname,
 
 	while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 	{
-		LOCAL_FCINFO(fcinfo, 0);
-		TriggerData trigdata = {0};
-
 		CHECK_FOR_INTERRUPTS();
 
-		/*
-		 * Make a call to the trigger function
-		 *
-		 * No parameters are passed, but we do set a context
-		 */
-		MemSet(fcinfo, 0, SizeForFunctionCallInfo(0));
+		tuplestore_puttupleslot(table, slot);
 
-		/*
-		 * We assume RI_FKey_check_ins won't look at flinfo...
-		 */
-		trigdata.type = T_TriggerData;
-		trigdata.tg_event = TRIGGER_EVENT_INSERT | TRIGGER_EVENT_ROW;
-		trigdata.tg_relation = rel;
-		trigdata.tg_trigtuple = ExecFetchSlotHeapTuple(slot, false, NULL);
-		trigdata.tg_trigslot = slot;
-		trigdata.tg_trigger = &trig;
+		MemoryContextReset(perTupCxt);
+	}
 
-		fcinfo->context = (Node *) &trigdata;
+	/*
+	 * Make a call to the trigger function
+	 *
+	 * No parameters are passed, but we do set a context
+	 */
+	MemSet(fcinfo, 0, SizeForFunctionCallInfo(0));
 
-		RI_FKey_check_ins(fcinfo);
+	/*
+	 * We assume RI_FKey_check_ins won't look at flinfo...
+	 */
+	trigdata.type = T_TriggerData;
+	trigdata.tg_event = TRIGGER_EVENT_INSERT | TRIGGER_EVENT_ROW;
+	trigdata.tg_relation = rel;
+	trigdata.tg_trigslot = slot;
+	trigdata.tg_trigger = &trig;
+	trigdata.tg_oldtable = table;
 
-		MemoryContextReset(perTupCxt);
-	}
+	fcinfo->context = (Node *) &trigdata;
+
+	RI_FKey_check_ins(fcinfo);
 
 	MemoryContextSwitchTo(oldcxt);
 	MemoryContextDelete(perTupCxt);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index ed551ab73a..cc18167fb4 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -105,6 +105,8 @@ static void AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 static void AfterTriggerEnlargeQueryState(void);
 static bool before_stmt_triggers_fired(Oid relid, CmdType cmdType);
 
+static TIDArray *alloc_tid_array(void);
+static void add_tid(TIDArray *ta, ItemPointer item);
 
 /*
  * Create a trigger.  Returns the address of the created trigger.
@@ -3337,10 +3339,14 @@ typedef struct AfterTriggerEventList
 /* Macros to help in iterating over a list of events */
 #define for_each_chunk(cptr, evtlist) \
 	for (cptr = (evtlist).head; cptr != NULL; cptr = cptr->next)
+#define next_event_in_chunk(eptr, cptr) \
+	(AfterTriggerEvent) (((char *) eptr) + SizeofTriggerEvent(eptr))
 #define for_each_event(eptr, cptr) \
 	for (eptr = (AfterTriggerEvent) CHUNK_DATA_START(cptr); \
 		 (char *) eptr < (cptr)->freeptr; \
-		 eptr = (AfterTriggerEvent) (((char *) eptr) + SizeofTriggerEvent(eptr)))
+		 eptr = next_event_in_chunk(eptr, cptr))
+#define is_last_event_in_chunk(eptr, cptr) \
+	((((char *) eptr) + SizeofTriggerEvent(eptr)) >= (cptr)->freeptr)
 /* Use this if no special per-chunk processing is needed */
 #define for_each_event_chunk(eptr, cptr, evtlist) \
 	for_each_chunk(cptr, evtlist) for_each_event(eptr, cptr)
@@ -3488,9 +3494,17 @@ static void AfterTriggerExecute(EState *estate,
 								TriggerDesc *trigdesc,
 								FmgrInfo *finfo,
 								Instrumentation *instr,
+								TriggerData *trig_last,
 								MemoryContext per_tuple_context,
+								MemoryContext batch_context,
 								TupleTableSlot *trig_tuple_slot1,
 								TupleTableSlot *trig_tuple_slot2);
+static void AfterTriggerExecuteRI(EState *estate,
+								  ResultRelInfo *relInfo,
+								  FmgrInfo *finfo,
+								  Instrumentation *instr,
+								  TriggerData *trig_last,
+								  MemoryContext batch_context);
 static AfterTriggersTableData *GetAfterTriggersTableData(Oid relid,
 														 CmdType cmdType);
 static void AfterTriggerFreeQuery(AfterTriggersQueryData *qs);
@@ -3807,13 +3821,16 @@ afterTriggerDeleteHeadEventChunk(AfterTriggersQueryData *qs)
  *	fmgr lookup cache space at the caller level.  (For triggers fired at
  *	the end of a query, we can even piggyback on the executor's state.)
  *
- *	event: event currently being fired.
+ *	event: event currently being fired. Pass NULL if the current batch of RI
+ *		trigger events should be processed.
  *	rel: open relation for event.
  *	trigdesc: working copy of rel's trigger info.
  *	finfo: array of fmgr lookup cache entries (one per trigger in trigdesc).
  *	instr: array of EXPLAIN ANALYZE instrumentation nodes (one per trigger),
  *		or NULL if no instrumentation is wanted.
+ *	trig_last: trigger info used for the last trigger execution.
  *	per_tuple_context: memory context to call trigger function in.
+ *	batch_context: memory context to store tuples for RI triggers.
  *	trig_tuple_slot1: scratch slot for tg_trigtuple (foreign tables only)
  *	trig_tuple_slot2: scratch slot for tg_newtuple (foreign tables only)
  * ----------
@@ -3824,39 +3841,55 @@ AfterTriggerExecute(EState *estate,
 					ResultRelInfo *relInfo,
 					TriggerDesc *trigdesc,
 					FmgrInfo *finfo, Instrumentation *instr,
+					TriggerData *trig_last,
 					MemoryContext per_tuple_context,
+					MemoryContext batch_context,
 					TupleTableSlot *trig_tuple_slot1,
 					TupleTableSlot *trig_tuple_slot2)
 {
 	Relation	rel = relInfo->ri_RelationDesc;
 	AfterTriggerShared evtshared = GetTriggerSharedData(event);
 	Oid			tgoid = evtshared->ats_tgoid;
-	TriggerData LocTriggerData = {0};
 	HeapTuple	rettuple;
-	int			tgindx;
 	bool		should_free_trig = false;
 	bool		should_free_new = false;
+	bool		is_new = false;
 
-	/*
-	 * Locate trigger in trigdesc.
-	 */
-	for (tgindx = 0; tgindx < trigdesc->numtriggers; tgindx++)
+	if (trig_last->tg_trigger == NULL)
 	{
-		if (trigdesc->triggers[tgindx].tgoid == tgoid)
+		int			tgindx;
+
+		/*
+		 * Locate trigger in trigdesc.
+		 */
+		for (tgindx = 0; tgindx < trigdesc->numtriggers; tgindx++)
 		{
-			LocTriggerData.tg_trigger = &(trigdesc->triggers[tgindx]);
-			break;
+			if (trigdesc->triggers[tgindx].tgoid == tgoid)
+			{
+				trig_last->tg_trigger = &(trigdesc->triggers[tgindx]);
+				trig_last->tgindx = tgindx;
+				break;
+			}
 		}
+		if (trig_last->tg_trigger == NULL)
+			elog(ERROR, "could not find trigger %u", tgoid);
+
+		if (RI_FKey_trigger_type(trig_last->tg_trigger->tgfoid) !=
+			RI_TRIGGER_NONE)
+			trig_last->is_ri_trigger = true;
+
+		is_new = true;
 	}
-	if (LocTriggerData.tg_trigger == NULL)
-		elog(ERROR, "could not find trigger %u", tgoid);
+
+	/* trig_last for non-RI trigger should always be initialized again. */
+	Assert(trig_last->is_ri_trigger || is_new);
 
 	/*
 	 * If doing EXPLAIN ANALYZE, start charging time to this trigger. We want
 	 * to include time spent re-fetching tuples in the trigger cost.
 	 */
-	if (instr)
-		InstrStartNode(instr + tgindx);
+	if (instr && !trig_last->is_ri_trigger)
+		InstrStartNode(instr + trig_last->tgindx);
 
 	/*
 	 * Fetch the required tuple(s).
@@ -3864,6 +3897,9 @@ AfterTriggerExecute(EState *estate,
 	switch (event->ate_flags & AFTER_TRIGGER_TUP_BITS)
 	{
 		case AFTER_TRIGGER_FDW_FETCH:
+			/* Foreign keys are not supported on foreign tables. */
+			Assert(!trig_last->is_ri_trigger);
+
 			{
 				Tuplestorestate *fdw_tuplestore = GetCurrentFDWTuplestore();
 
@@ -3879,6 +3915,8 @@ AfterTriggerExecute(EState *estate,
 			}
 			/* fall through */
 		case AFTER_TRIGGER_FDW_REUSE:
+			/* Foreign keys are not supported on foreign tables. */
+			Assert(!trig_last->is_ri_trigger);
 
 			/*
 			 * Store tuple in the slot so that tg_trigtuple does not reference
@@ -3889,38 +3927,56 @@ AfterTriggerExecute(EState *estate,
 			 * that is stored as a heap tuple, constructed in different memory
 			 * context, in the slot anyway.
 			 */
-			LocTriggerData.tg_trigslot = trig_tuple_slot1;
-			LocTriggerData.tg_trigtuple =
+			trig_last->tg_trigslot = trig_tuple_slot1;
+			trig_last->tg_trigtuple =
 				ExecFetchSlotHeapTuple(trig_tuple_slot1, true, &should_free_trig);
 
 			if ((evtshared->ats_event & TRIGGER_EVENT_OPMASK) ==
 				TRIGGER_EVENT_UPDATE)
 			{
-				LocTriggerData.tg_newslot = trig_tuple_slot2;
-				LocTriggerData.tg_newtuple =
+				trig_last->tg_newslot = trig_tuple_slot2;
+				trig_last->tg_newtuple =
 					ExecFetchSlotHeapTuple(trig_tuple_slot2, true, &should_free_new);
 			}
 			else
 			{
-				LocTriggerData.tg_newtuple = NULL;
+				trig_last->tg_newtuple = NULL;
 			}
 			break;
 
 		default:
 			if (ItemPointerIsValid(&(event->ate_ctid1)))
 			{
-				LocTriggerData.tg_trigslot = ExecGetTriggerOldSlot(estate, relInfo);
+				if (!trig_last->is_ri_trigger)
+				{
+					trig_last->tg_trigslot = ExecGetTriggerOldSlot(estate,
+																   relInfo);
 
-				if (!table_tuple_fetch_row_version(rel, &(event->ate_ctid1),
-												   SnapshotAny,
-												   LocTriggerData.tg_trigslot))
-					elog(ERROR, "failed to fetch tuple1 for AFTER trigger");
-				LocTriggerData.tg_trigtuple =
-					ExecFetchSlotHeapTuple(LocTriggerData.tg_trigslot, false, &should_free_trig);
+					if (!table_tuple_fetch_row_version(rel, &(event->ate_ctid1),
+													   SnapshotAny,
+													   trig_last->tg_trigslot))
+						elog(ERROR, "failed to fetch tuple1 for AFTER trigger");
+
+					trig_last->tg_trigtuple =
+						ExecFetchSlotHeapTuple(trig_last->tg_trigslot, false,
+											   &should_free_trig);
+				}
+				else
+				{
+					if (trig_last->ri_tids_old == NULL)
+					{
+						MemoryContext oldcxt;
+
+						oldcxt = MemoryContextSwitchTo(batch_context);
+						trig_last->ri_tids_old = alloc_tid_array();
+						MemoryContextSwitchTo(oldcxt);
+					}
+					add_tid(trig_last->ri_tids_old, &(event->ate_ctid1));
+				}
 			}
 			else
 			{
-				LocTriggerData.tg_trigtuple = NULL;
+				trig_last->tg_trigtuple = NULL;
 			}
 
 			/* don't touch ctid2 if not there */
@@ -3928,18 +3984,36 @@ AfterTriggerExecute(EState *estate,
 				AFTER_TRIGGER_2CTID &&
 				ItemPointerIsValid(&(event->ate_ctid2)))
 			{
-				LocTriggerData.tg_newslot = ExecGetTriggerNewSlot(estate, relInfo);
+				if (!trig_last->is_ri_trigger)
+				{
+					trig_last->tg_newslot = ExecGetTriggerNewSlot(estate,
+																  relInfo);
 
-				if (!table_tuple_fetch_row_version(rel, &(event->ate_ctid2),
-												   SnapshotAny,
-												   LocTriggerData.tg_newslot))
-					elog(ERROR, "failed to fetch tuple2 for AFTER trigger");
-				LocTriggerData.tg_newtuple =
-					ExecFetchSlotHeapTuple(LocTriggerData.tg_newslot, false, &should_free_new);
+					if (!table_tuple_fetch_row_version(rel, &(event->ate_ctid2),
+													   SnapshotAny,
+													   trig_last->tg_newslot))
+						elog(ERROR, "failed to fetch tuple2 for AFTER trigger");
+
+					trig_last->tg_newtuple =
+						ExecFetchSlotHeapTuple(trig_last->tg_newslot, false,
+											   &should_free_new);
+				}
+				else
+				{
+					if (trig_last->ri_tids_new == NULL)
+					{
+						MemoryContext oldcxt;
+
+						oldcxt = MemoryContextSwitchTo(batch_context);
+						trig_last->ri_tids_new = alloc_tid_array();
+						MemoryContextSwitchTo(oldcxt);
+					}
+					add_tid(trig_last->ri_tids_new, &(event->ate_ctid2));
+				}
 			}
 			else
 			{
-				LocTriggerData.tg_newtuple = NULL;
+				trig_last->tg_newtuple = NULL;
 			}
 	}
 
@@ -3949,19 +4023,26 @@ AfterTriggerExecute(EState *estate,
 	 * a trigger, mark it "closed" so that it cannot change anymore.  If any
 	 * additional events of the same type get queued in the current trigger
 	 * query level, they'll go into new transition tables.
+	 *
+	 * RI triggers treat the tuplestores specially, see above.
 	 */
-	LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL;
+	if (!trig_last->is_ri_trigger)
+		trig_last->tg_oldtable = trig_last->tg_newtable = NULL;
+
 	if (evtshared->ats_table)
 	{
-		if (LocTriggerData.tg_trigger->tgoldtable)
+		/* There shouldn't be any transition table for an RI trigger event. */
+		Assert(!trig_last->is_ri_trigger);
+
+		if (trig_last->tg_trigger->tgoldtable)
 		{
-			LocTriggerData.tg_oldtable = evtshared->ats_table->old_tuplestore;
+			trig_last->tg_oldtable = evtshared->ats_table->old_tuplestore;
 			evtshared->ats_table->closed = true;
 		}
 
-		if (LocTriggerData.tg_trigger->tgnewtable)
+		if (trig_last->tg_trigger->tgnewtable)
 		{
-			LocTriggerData.tg_newtable = evtshared->ats_table->new_tuplestore;
+			trig_last->tg_newtable = evtshared->ats_table->new_tuplestore;
 			evtshared->ats_table->closed = true;
 		}
 	}
@@ -3969,54 +4050,139 @@ AfterTriggerExecute(EState *estate,
 	/*
 	 * Setup the remaining trigger information
 	 */
-	LocTriggerData.type = T_TriggerData;
-	LocTriggerData.tg_event =
-		evtshared->ats_event & (TRIGGER_EVENT_OPMASK | TRIGGER_EVENT_ROW);
-	LocTriggerData.tg_relation = rel;
-	if (TRIGGER_FOR_UPDATE(LocTriggerData.tg_trigger->tgtype))
-		LocTriggerData.tg_updatedcols = evtshared->ats_modifiedcols;
-
-	MemoryContextReset(per_tuple_context);
+	if (is_new)
+	{
+		trig_last->type = T_TriggerData;
+		trig_last->tg_event =
+			evtshared->ats_event & (TRIGGER_EVENT_OPMASK | TRIGGER_EVENT_ROW);
+		trig_last->tg_relation = rel;
+		if (TRIGGER_FOR_UPDATE(trig_last->tg_trigger->tgtype))
+			trig_last->tg_updatedcols = evtshared->ats_modifiedcols;
+	}
 
 	/*
-	 * Call the trigger and throw away any possibly returned updated tuple.
-	 * (Don't let ExecCallTriggerFunc measure EXPLAIN time.)
+	 * RI triggers are executed in batches, see the top of the function.
 	 */
-	rettuple = ExecCallTriggerFunc(&LocTriggerData,
-								   tgindx,
-								   finfo,
-								   NULL,
-								   per_tuple_context);
-	if (rettuple != NULL &&
-		rettuple != LocTriggerData.tg_trigtuple &&
-		rettuple != LocTriggerData.tg_newtuple)
-		heap_freetuple(rettuple);
+	if (!trig_last->is_ri_trigger)
+	{
+		MemoryContextReset(per_tuple_context);
+
+		/*
+		 * Call the trigger and throw away any possibly returned updated
+		 * tuple. (Don't let ExecCallTriggerFunc measure EXPLAIN time.)
+		 */
+		rettuple = ExecCallTriggerFunc(trig_last,
+									   trig_last->tgindx,
+									   finfo,
+									   NULL,
+									   per_tuple_context);
+		if (rettuple != NULL &&
+			rettuple != trig_last->tg_trigtuple &&
+			rettuple != trig_last->tg_newtuple)
+			heap_freetuple(rettuple);
+	}
 
 	/*
 	 * Release resources
 	 */
 	if (should_free_trig)
-		heap_freetuple(LocTriggerData.tg_trigtuple);
+		heap_freetuple(trig_last->tg_trigtuple);
 	if (should_free_new)
-		heap_freetuple(LocTriggerData.tg_newtuple);
+		heap_freetuple(trig_last->tg_newtuple);
 
-	/* don't clear slots' contents if foreign table */
-	if (trig_tuple_slot1 == NULL)
+	/*
+	 * Don't clear slots' contents if foreign table.
+	 *
+	 * For for RI trigger we manage these slots separately, see
+	 * AfterTriggerExecuteRI().
+	 */
+	if (trig_tuple_slot1 == NULL && !trig_last->is_ri_trigger)
 	{
-		if (LocTriggerData.tg_trigslot)
-			ExecClearTuple(LocTriggerData.tg_trigslot);
-		if (LocTriggerData.tg_newslot)
-			ExecClearTuple(LocTriggerData.tg_newslot);
+		if (trig_last->tg_trigslot)
+			ExecClearTuple(trig_last->tg_trigslot);
+		if (trig_last->tg_newslot)
+			ExecClearTuple(trig_last->tg_newslot);
 	}
 
 	/*
 	 * If doing EXPLAIN ANALYZE, stop charging time to this trigger, and count
 	 * one "tuple returned" (really the number of firings).
 	 */
-	if (instr)
-		InstrStopNode(instr + tgindx, 1);
+	if (instr && !trig_last->is_ri_trigger)
+		InstrStopNode(instr + trig_last->tgindx, 1);
+
+	/* RI triggers use trig_last across calls. */
+	if (!trig_last->is_ri_trigger)
+		memset(trig_last, 0, sizeof(TriggerData));
 }
 
+/*
+ * AfterTriggerExecuteRI()
+ *
+ * Execute an RI trigger. It's assumed that AfterTriggerExecute() recognized
+ * RI trigger events and only added them to the batch instead of executing
+ * them. The actual processing of the batch is done by this function.
+ */
+static void
+AfterTriggerExecuteRI(EState *estate,
+					  ResultRelInfo *relInfo,
+					  FmgrInfo *finfo,
+					  Instrumentation *instr,
+					  TriggerData *trig_last,
+					  MemoryContext batch_context)
+{
+	HeapTuple	rettuple;
+
+	/*
+	 * AfterTriggerExecute() must have been called for this trigger already.
+	 */
+	Assert(trig_last->tg_trigger);
+	Assert(trig_last->is_ri_trigger);
+
+	/*
+	 * RI trigger constructs a local tuplestore when it needs it. The point is
+	 * that it might need to check visibility first. If we put the tuples into
+	 * a tuplestore now, it'd be hard to keep pins of the containing buffers,
+	 * and so table_tuple_satisfies_snapshot check wouldn't work.
+	 */
+	Assert(trig_last->tg_oldtable == NULL);
+	Assert(trig_last->tg_newtable == NULL);
+
+	/* Initialize the slots to retrieve the rows by TID. */
+	trig_last->tg_trigslot = ExecGetTriggerOldSlot(estate, relInfo);
+	trig_last->tg_newslot = ExecGetTriggerNewSlot(estate, relInfo);
+
+	if (instr)
+		InstrStartNode(instr + trig_last->tgindx);
+
+	/*
+	 * Call the trigger and throw away any possibly returned updated tuple.
+	 * (Don't let ExecCallTriggerFunc measure EXPLAIN time.)
+	 *
+	 * batch_context already contains the TIDs of the affected rows. The RI
+	 * trigger should also use this context to create the tuplestore for them.
+	 */
+	rettuple = ExecCallTriggerFunc(trig_last,
+								   trig_last->tgindx,
+								   finfo,
+								   NULL,
+								   batch_context);
+	if (rettuple != NULL &&
+		rettuple != trig_last->tg_trigtuple &&
+		rettuple != trig_last->tg_newtuple)
+		heap_freetuple(rettuple);
+
+	if (instr)
+		InstrStopNode(instr + trig_last->tgindx, 1);
+
+	ExecClearTuple(trig_last->tg_trigslot);
+	ExecClearTuple(trig_last->tg_newslot);
+
+	MemoryContextReset(batch_context);
+
+	memset(trig_last, 0, sizeof(TriggerData));
+	return;
+}
 
 /*
  * afterTriggerMarkEvents()
@@ -4112,7 +4278,8 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 {
 	bool		all_fired = true;
 	AfterTriggerEventChunk *chunk;
-	MemoryContext per_tuple_context;
+	MemoryContext per_tuple_context,
+				batch_context;
 	bool		local_estate = false;
 	ResultRelInfo *rInfo = NULL;
 	Relation	rel = NULL;
@@ -4121,6 +4288,7 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 	Instrumentation *instr = NULL;
 	TupleTableSlot *slot1 = NULL,
 			   *slot2 = NULL;
+	TriggerData trig_last;
 
 	/* Make a local EState if need be */
 	if (estate == NULL)
@@ -4134,6 +4302,14 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 		AllocSetContextCreate(CurrentMemoryContext,
 							  "AfterTriggerTupleContext",
 							  ALLOCSET_DEFAULT_SIZES);
+	/* Separate context for a batch of RI trigger events. */
+	batch_context =
+		AllocSetContextCreate(CurrentMemoryContext,
+							  "AfterTriggerBatchContext",
+							  ALLOCSET_DEFAULT_SIZES);
+
+	/* No trigger executed yet in this batch. */
+	memset(&trig_last, 0, sizeof(TriggerData));
 
 	for_each_chunk(chunk, *events)
 	{
@@ -4150,6 +4326,8 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 			if ((event->ate_flags & AFTER_TRIGGER_IN_PROGRESS) &&
 				evtshared->ats_firing_id == firing_id)
 			{
+				bool		fire_ri_batch = false;
+
 				/*
 				 * So let's fire it... but first, find the correct relation if
 				 * this is not the same relation as before.
@@ -4180,12 +4358,60 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 				}
 
 				/*
-				 * Fire it.  Note that the AFTER_TRIGGER_IN_PROGRESS flag is
-				 * still set, so recursive examinations of the event list
-				 * won't try to re-fire it.
+				 * Fire it (or add the corresponding tuple(s) to the current
+				 * batch if it's RI trigger).
+				 *
+				 * Note that the AFTER_TRIGGER_IN_PROGRESS flag is still set,
+				 * so recursive examinations of the event list won't try to
+				 * re-fire it.
 				 */
-				AfterTriggerExecute(estate, event, rInfo, trigdesc, finfo, instr,
-									per_tuple_context, slot1, slot2);
+				AfterTriggerExecute(estate, event, rInfo, trigdesc, finfo,
+									instr, &trig_last,
+									per_tuple_context, batch_context,
+									slot1, slot2);
+
+				/*
+				 * RI trigger events are processed in batches, so extra work
+				 * might be needed to finish the current batch. It's important
+				 * to do this before the chunk iteration ends because the
+				 * trigger execution may generate other events.
+				 *
+				 * XXX Implement maximum batch size so that constraint
+				 * violations are reported as soon as possible?
+				 */
+				if (trig_last.tg_trigger && trig_last.is_ri_trigger)
+				{
+					if (is_last_event_in_chunk(event, chunk))
+						fire_ri_batch = true;
+					else
+					{
+						AfterTriggerEvent evtnext;
+						AfterTriggerShared evtshnext;
+
+						/*
+						 * We even need to look ahead because the next event
+						 * might be affected by execution of the current one.
+						 * For example if the next event is an AS trigger
+						 * event to be cancelled (cancel_prior_stmt_triggers)
+						 * because the current event, during its execution,
+						 * generates a new AS event for the same trigger.
+						 */
+						evtnext = next_event_in_chunk(event, chunk);
+						evtshnext = GetTriggerSharedData(evtnext);
+
+						if (evtshnext != evtshared)
+							fire_ri_batch = true;
+					}
+				}
+
+				if (fire_ri_batch)
+					AfterTriggerExecuteRI(estate,
+										  rInfo,
+										  finfo,
+										  instr,
+										  &trig_last,
+										  batch_context);
+
 
 				/*
 				 * Mark the event as done.
@@ -4216,6 +4442,7 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 				events->tailfree = chunk->freeptr;
 		}
 	}
+
 	if (slot1 != NULL)
 	{
 		ExecDropSingleTupleTableSlot(slot1);
@@ -4224,6 +4451,7 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 
 	/* Release working resources */
 	MemoryContextDelete(per_tuple_context);
+	MemoryContextDelete(batch_context);
 
 	if (local_estate)
 	{
@@ -5812,3 +6040,29 @@ pg_trigger_depth(PG_FUNCTION_ARGS)
 {
 	PG_RETURN_INT32(MyTriggerDepth);
 }
+
+static TIDArray *
+alloc_tid_array(void)
+{
+	TIDArray   *result = (TIDArray *) palloc(sizeof(TIDArray));
+
+	/* XXX Tune the chunk size. */
+	result->nmax = 1024;
+	result->tids = (ItemPointer) palloc(result->nmax *
+										sizeof(ItemPointerData));
+	result->n = 0;
+	return result;
+}
+
+static void
+add_tid(TIDArray *ta, ItemPointer item)
+{
+	if (ta->n == ta->nmax)
+	{
+		ta->nmax += 1024;
+		ta->tids = (ItemPointer) repalloc(ta->tids,
+										  ta->nmax * sizeof(ItemPointerData));
+	}
+	memcpy(ta->tids + ta->n, item, sizeof(ItemPointerData));
+	ta->n++;
+}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index b108168821..37026219b6 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2929,12 +2929,14 @@ SPI_register_trigger_data(TriggerData *tdata)
 	if (tdata->tg_newtable)
 	{
 		EphemeralNamedRelation enr =
-		palloc(sizeof(EphemeralNamedRelationData));
+		palloc0(sizeof(EphemeralNamedRelationData));
 		int			rc;
 
 		enr->md.name = tdata->tg_trigger->tgnewtable;
-		enr->md.reliddesc = tdata->tg_relation->rd_id;
-		enr->md.tupdesc = NULL;
+		if (tdata->desc)
+			enr->md.tupdesc = tdata->desc;
+		else
+			enr->md.reliddesc = tdata->tg_relation->rd_id;
 		enr->md.enrtype = ENR_NAMED_TUPLESTORE;
 		enr->md.enrtuples = tuplestore_tuple_count(tdata->tg_newtable);
 		enr->reldata = tdata->tg_newtable;
@@ -2946,12 +2948,14 @@ SPI_register_trigger_data(TriggerData *tdata)
 	if (tdata->tg_oldtable)
 	{
 		EphemeralNamedRelation enr =
-		palloc(sizeof(EphemeralNamedRelationData));
+		palloc0(sizeof(EphemeralNamedRelationData));
 		int			rc;
 
 		enr->md.name = tdata->tg_trigger->tgoldtable;
-		enr->md.reliddesc = tdata->tg_relation->rd_id;
-		enr->md.tupdesc = NULL;
+		if (tdata->desc)
+			enr->md.tupdesc = tdata->desc;
+		else
+			enr->md.reliddesc = tdata->tg_relation->rd_id;
 		enr->md.enrtype = ENR_NAMED_TUPLESTORE;
 		enr->md.enrtuples = tuplestore_tuple_count(tdata->tg_oldtable);
 		enr->reldata = tdata->tg_oldtable;
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 93e46ddf7a..8d952e0866 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -69,15 +69,17 @@
 
 /* RI query type codes */
 /* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
+#define RI_PLAN_CHECK_LOOKUPPK_INS		1
+#define RI_PLAN_CHECK_LOOKUPPK_UPD		2
+#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	3
 #define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
 /* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_DEL_DODELETE	3
-#define RI_PLAN_CASCADE_UPD_DOUPDATE	4
-#define RI_PLAN_RESTRICT_CHECKREF		5
-#define RI_PLAN_SETNULL_DOUPDATE		6
-#define RI_PLAN_SETDEFAULT_DOUPDATE		7
+#define RI_PLAN_CASCADE_DEL_DODELETE	4
+#define RI_PLAN_CASCADE_UPD_DOUPDATE	5
+#define RI_PLAN_RESTRICT_CHECKREF		6
+#define RI_PLAN_RESTRICT_CHECKREF_NO_ACTION		7
+#define RI_PLAN_SETNULL_DOUPDATE		8
+#define RI_PLAN_SETDEFAULT_DOUPDATE		9
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -114,6 +116,7 @@ typedef struct RI_ConstraintInfo
 	Oid			pf_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = FK) */
 	Oid			pp_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = PK) */
 	Oid			ff_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (FK = FK) */
+	TupleDesc	desc_pk_both;	/* Both OLD an NEW version of PK table row. */
 	dlist_node	valid_link;		/* Link in list of valid entries */
 } RI_ConstraintInfo;
 
@@ -173,10 +176,15 @@ static int	ri_constraint_cache_valid_count = 0;
 /*
  * Local function prototypes
  */
-static bool ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
-							  TupleTableSlot *oldslot,
-							  const RI_ConstraintInfo *riinfo);
+static char *RI_FKey_check_query(const RI_ConstraintInfo *riinfo,
+								 Relation fk_rel, Relation pk_rel,
+								 bool insert);
+static bool RI_FKey_check_query_required(Trigger *trigger, Relation fk_rel,
+										 TupleTableSlot *newslot);
 static Datum ri_restrict(TriggerData *trigdata, bool is_no_action);
+static char *ri_restrict_query(const RI_ConstraintInfo *riinfo,
+							   Relation fk_rel, Relation pk_rel,
+							   bool no_action);
 static Datum ri_set(TriggerData *trigdata, bool is_set_null);
 static void quoteOneName(char *buffer, const char *name);
 static void quoteRelationName(char *buffer, Relation rel);
@@ -186,6 +194,9 @@ static void ri_GenerateQual(StringInfo buf, char *sep, int nkeys,
 							const int16 *lattnums,
 							const char *rtabname, Relation rrel,
 							const int16 *rattnums, const Oid *eq_oprs);
+static void ri_GenerateKeyList(StringInfo buf, int nkeys,
+							   const char *tabname, Relation rel,
+							   const int16 *attnums);
 static void ri_GenerateQualComponent(StringInfo buf,
 									 const char *sep,
 									 const char *leftop, Oid leftoptype,
@@ -212,21 +223,37 @@ static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
 							int tgkind);
 static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
-static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
-static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-							   RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid,
+													  Relation trig_rel,
+													  bool rel_is_pk);
+static SPIPlanPtr ri_PlanCheck(const char *querystr, RI_QueryKey *qkey,
+							   Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 							RI_QueryKey *qkey, SPIPlanPtr qplan,
 							Relation fk_rel, Relation pk_rel,
-							TupleTableSlot *oldslot, TupleTableSlot *newslot,
 							bool detectNewRows, int expect_OK);
-static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
-							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
-							 Datum *vals, char *nulls);
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static int	ri_register_trigger_data(TriggerData *tdata,
+									 Tuplestorestate *oldtable,
+									 Tuplestorestate *newtable,
+									 TupleDesc desc);
+static Tuplestorestate *get_event_tuplestore(TriggerData *trigdata, bool old,
+											 Snapshot snapshot);
+static Tuplestorestate *get_event_tuplestore_for_cascade_update(TriggerData *trigdata,
+																const RI_ConstraintInfo *riinfo);
+static void add_key_attrs_to_tupdesc(TupleDesc tupdesc,
+									 Relation rel,
+									 const RI_ConstraintInfo *riinfo,
+									 bool old);
+static void add_key_values(TupleTableSlot *slot,
+						   const RI_ConstraintInfo *riinfo,
+						   Relation rel, ItemPointer ip,
+						   Datum *key_values, bool *key_nulls,
+						   Datum *values, bool *nulls, bool old);
+static TupleTableSlot *get_violator_tuple(Relation rel);
 
 
 /*
@@ -240,37 +267,212 @@ RI_FKey_check(TriggerData *trigdata)
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
-	TupleTableSlot *newslot;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
+	Tuplestorestate *oldtable = NULL;
+	Tuplestorestate *newtable = NULL;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
 
-	if (TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
-		newslot = trigdata->tg_newslot;
-	else
-		newslot = trigdata->tg_trigslot;
+	/*
+	 * Get the relation descriptors of the FK and PK tables.
+	 *
+	 * pk_rel is opened in RowShareLock mode since that's what our eventual
+	 * SELECT FOR KEY SHARE will get on it.
+	 */
+	fk_rel = trigdata->tg_relation;
+	pk_rel = table_open(riinfo->pk_relid, RowShareLock);
 
 	/*
+	 * Retrieve the changed rows and put them into the appropriate tuplestore.
+	 *
 	 * We should not even consider checking the row if it is no longer valid,
 	 * since it was either deleted (so the deferred check should be skipped)
 	 * or updated (in which case only the latest version of the row should be
-	 * checked).  Test its liveness according to SnapshotSelf.  We need pin
-	 * and lock on the buffer to call HeapTupleSatisfiesVisibility.  Caller
+	 * checked).  Test its liveness according to SnapshotSelf.	We need pin
+	 * and lock on the buffer to call HeapTupleSatisfiesVisibility.	 Caller
 	 * should be holding pin, but not lock.
 	 */
-	if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
-		return PointerGetDatum(NULL);
+	if (TRIGGER_FIRED_BY_INSERT(trigdata->tg_event))
+	{
+		if (trigdata->ri_tids_old)
+			oldtable = get_event_tuplestore(trigdata, true, SnapshotSelf);
+		else
+		{
+			/* The whole table is passed if not called from trigger.c */
+			oldtable = trigdata->tg_oldtable;
+		}
+	}
+	else
+	{
+		if (trigdata->ri_tids_new)
+			newtable = get_event_tuplestore(trigdata, false, SnapshotSelf);
+		else
+		{
+			/* The whole table is passed if not called from trigger.c */
+			newtable = trigdata->tg_newtable;
+		}
+	}
+
+	if (SPI_connect() != SPI_OK_CONNECT)
+		elog(ERROR, "SPI_connect failed");
+
+	if (ri_register_trigger_data(trigdata, oldtable, newtable, NULL) !=
+		SPI_OK_TD_REGISTER)
+		elog(ERROR, "ri_register_trigger_data failed");
+
+	if (TRIGGER_FIRED_BY_INSERT(trigdata->tg_event))
+	{
+		/* Fetch or prepare a saved plan for the real check */
+		ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_INS);
+
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+		{
+			char	   *query;
+
+			query = RI_FKey_check_query(riinfo, fk_rel, pk_rel, true);
+
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, &qkey, fk_rel, pk_rel);
+		}
+	}
+	else
+	{
+		Assert((TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event)));
+
+		/* Fetch or prepare a saved plan for the real check */
+		ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_UPD);
+
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+		{
+			char	   *query;
+
+			query = RI_FKey_check_query(riinfo, fk_rel, pk_rel, false);
+
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, &qkey, fk_rel, pk_rel);
+		}
+	}
 
 	/*
-	 * Get the relation descriptors of the FK and PK tables.
-	 *
-	 * pk_rel is opened in RowShareLock mode since that's what our eventual
-	 * SELECT FOR KEY SHARE will get on it.
+	 * Now check that foreign key exists in PK table
 	 */
-	fk_rel = trigdata->tg_relation;
-	pk_rel = table_open(riinfo->pk_relid, RowShareLock);
+	if (ri_PerformCheck(riinfo, &qkey, qplan,
+						fk_rel, pk_rel,
+						false,
+						SPI_OK_SELECT))
+	{
+		TupleTableSlot *violatorslot = get_violator_tuple(fk_rel);
+
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   violatorslot,
+						   NULL,
+						   qkey.constr_queryno, false);
+	}
+
+	if (SPI_finish() != SPI_OK_FINISH)
+		elog(ERROR, "SPI_finish failed");
+
+	table_close(pk_rel, RowShareLock);
+
+	return PointerGetDatum(NULL);
+}
+
+/* ----------
+ * Construct the query to check inserted/updated rows of the FK table.
+ *
+ * If "insert" is true, the rows are inserted, otherwise they are updated.
+ *
+ * The query string built is
+ *	SELECT t.fkatt1 [, ...]
+ *		FROM <tgtable> t LEFT JOIN LATERAL
+ *		    (SELECT t.fkatt1 [, ...]
+ *               FROM [ONLY] <pktable> p
+ *		         WHERE t.fkatt1 = p.pkatt1 [AND ...]
+ *		         FOR KEY SHARE OF p) AS m
+ *		     ON t.fkatt1 = m.fkatt1 [AND ...]
+ *		WHERE m.fkatt1 ISNULL
+ *	    LIMIT 1
+ *
+ * where <tgtable> is "tgoldtable" for INSERT and "tgnewtable" for UPDATE
+ * events.
+ *
+ * It returns the first row that violates the constraint.
+ *
+ * "m" returns the new rows that do have matching PK row. It is a subquery
+ * because the FOR KEY SHARE clause cannot reference the nullable side of an
+ * outer join.
+ *
+ * XXX "tgoldtable" looks confusing for insert, but that's where
+ * AfterTriggerExecute() stores tuples whose events don't have
+ * AFTER_TRIGGER_2CTID set. For a non-RI trigger, the inserted tuple would
+ * fall into tg_trigtuple as opposed to tg_newtuple, which seems a similar
+ * problem. It doesn't seem worth any renaming or adding extra tuplestores to
+ * TriggerData.
+ * ----------
+ */
+static char *
+RI_FKey_check_query(const RI_ConstraintInfo *riinfo, Relation fk_rel,
+					Relation pk_rel, bool insert)
+{
+	StringInfo	querybuf = makeStringInfo();
+	StringInfo	subquerybuf = makeStringInfo();
+	char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
+	const char *pk_only;
+	const char *tgtable;
+	char	   *col_test;
+
+	tgtable = insert ? "tgoldtable" : "tgnewtable";
+
+	quoteRelationName(pkrelname, pk_rel);
+
+	/* Construct the subquery. */
+	pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+		"" : "ONLY ";
+	appendStringInfoString(subquerybuf,
+						   "(SELECT ");
+	ri_GenerateKeyList(subquerybuf, riinfo->nkeys, "t", fk_rel,
+					   riinfo->fk_attnums);
+	appendStringInfo(subquerybuf,
+					 " FROM %s%s p WHERE ",
+					 pk_only, pkrelname);
+	ri_GenerateQual(subquerybuf, "AND", riinfo->nkeys,
+					"p", pk_rel, riinfo->pk_attnums,
+					"t", fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs);
+	appendStringInfoString(subquerybuf, " FOR KEY SHARE OF p) AS m");
+
+	/* Now the main query. */
+	appendStringInfoString(querybuf, "SELECT ");
+	ri_GenerateKeyList(querybuf, riinfo->nkeys, "t", fk_rel,
+					   riinfo->fk_attnums);
+	appendStringInfo(querybuf,
+					 " FROM %s t LEFT JOIN LATERAL %s ON ",
+					 tgtable, subquerybuf->data);
+	ri_GenerateQual(querybuf, "AND", riinfo->nkeys,
+					"t", fk_rel, riinfo->fk_attnums,
+					"m", fk_rel, riinfo->fk_attnums,
+					riinfo->ff_eq_oprs);
+	col_test = ri_ColNameQuoted("m", RIAttName(fk_rel, riinfo->fk_attnums[0]));
+	appendStringInfo(querybuf, " WHERE %s ISNULL ", col_test);
+	appendStringInfoString(querybuf, " LIMIT 1");
+
+	return querybuf->data;
+}
+
+/*
+ * Check if the PK table needs to be queried (using the query generated by
+ * RI_FKey_check_query).
+ */
+static bool
+RI_FKey_check_query_required(Trigger *trigger, Relation fk_rel,
+							 TupleTableSlot *newslot)
+{
+	const RI_ConstraintInfo *riinfo;
+
+	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
 	switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false))
 	{
@@ -280,8 +482,7 @@ RI_FKey_check(TriggerData *trigdata)
 			 * No further check needed - an all-NULL key passes every type of
 			 * foreign key constraint.
 			 */
-			table_close(pk_rel, RowShareLock);
-			return PointerGetDatum(NULL);
+			return false;
 
 		case RI_KEYS_SOME_NULL:
 
@@ -305,8 +506,7 @@ RI_FKey_check(TriggerData *trigdata)
 							 errdetail("MATCH FULL does not allow mixing of null and nonnull key values."),
 							 errtableconstraint(fk_rel,
 												NameStr(riinfo->conname))));
-					table_close(pk_rel, RowShareLock);
-					return PointerGetDatum(NULL);
+					break;
 
 				case FKCONSTR_MATCH_SIMPLE:
 
@@ -314,17 +514,16 @@ RI_FKey_check(TriggerData *trigdata)
 					 * MATCH SIMPLE - if ANY column is null, the key passes
 					 * the constraint.
 					 */
-					table_close(pk_rel, RowShareLock);
-					return PointerGetDatum(NULL);
+					return false;
 
 #ifdef NOT_USED
 				case FKCONSTR_MATCH_PARTIAL:
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying the query to only
+					 * include non-null columns, or by writing a special
+					 * version)
 					 */
 					break;
 #endif
@@ -333,85 +532,12 @@ RI_FKey_check(TriggerData *trigdata)
 		case RI_KEYS_NONE_NULL:
 
 			/*
-			 * Have a full qualified key - continue below for all three kinds
-			 * of MATCH.
+			 * Have a full qualified key - regular check is needed.
 			 */
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQualComponent(&querybuf, querysep,
-									 attname, pk_type,
-									 riinfo->pf_eq_oprs[i],
-									 paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 */
-	if (!ri_PerformCheck(riinfo, &qkey, qplan,
-						 fk_rel, pk_rel,
-						 NULL, newslot,
-						 false,
-						 SPI_OK_SELECT))
-		ri_ReportViolation(riinfo,
-						   pk_rel, fk_rel,
-						   newslot,
-						   NULL,
-						   qkey.constr_queryno, false);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	table_close(pk_rel, RowShareLock);
-
-	return PointerGetDatum(NULL);
+	return true;
 }
 
 
@@ -447,99 +573,6 @@ RI_FKey_check_upd(PG_FUNCTION_ARGS)
 }
 
 
-/*
- * ri_Check_Pk_Match
- *
- * Check to see if another PK row has been created that provides the same
- * key values as the "oldslot" that's been modified or deleted in our trigger
- * event.  Returns true if a match is found in the PK table.
- *
- * We assume the caller checked that the oldslot contains no NULL key values,
- * since otherwise a match is impossible.
- */
-static bool
-ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
-				  TupleTableSlot *oldslot,
-				  const RI_ConstraintInfo *riinfo)
-{
-	SPIPlanPtr	qplan;
-	RI_QueryKey qkey;
-	bool		result;
-
-	/* Only called for non-null rows */
-	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
-
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/*
-	 * Fetch or prepare a saved plan for checking PK table with values coming
-	 * from a PK row
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * PK attributes themselves.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQualComponent(&querybuf, querysep,
-									 attname, pk_type,
-									 riinfo->pp_eq_oprs[i],
-									 paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * We have a plan now. Run it.
-	 */
-	result = ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 oldslot, NULL,
-							 true,	/* treat like update */
-							 SPI_OK_SELECT);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	return result;
-}
-
-
 /*
  * RI_FKey_noaction_del -
  *
@@ -626,9 +659,9 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
-	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
+	Tuplestorestate *oldtable;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -641,79 +674,54 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	 */
 	fk_rel = table_open(riinfo->fk_relid, RowShareLock);
 	pk_rel = trigdata->tg_relation;
-	oldslot = trigdata->tg_trigslot;
 
-	/*
-	 * If another PK row now exists providing the old key values, we should
-	 * not do anything.  However, this check should only be made in the NO
-	 * ACTION case; in RESTRICT cases we don't wish to allow another row to be
-	 * substituted.
-	 */
-	if (is_no_action &&
-		ri_Check_Pk_Match(pk_rel, fk_rel, oldslot, riinfo))
-	{
-		table_close(fk_rel, RowShareLock);
-		return PointerGetDatum(NULL);
-	}
+	oldtable = get_event_tuplestore(trigdata, true, NULL);
 
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
 
-	/*
-	 * Fetch or prepare a saved plan for the restrict lookup (it's the same
-	 * query for delete and update cases)
-	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_RESTRICT_CHECKREF);
+	if (ri_register_trigger_data(trigdata, oldtable, NULL, NULL) !=
+		SPI_OK_TD_REGISTER)
+		elog(ERROR, "ri_register_trigger_data failed");
 
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+	if (!is_no_action)
 	{
-		StringInfoData querybuf;
-		char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *fk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <fktable> x WHERE $1 = fkatt1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding PK attributes.
-		 * ----------
+		/*
+		 * Fetch or prepare a saved plan for the restrict lookup (it's the
+		 * same query for delete and update cases)
 		 */
-		initStringInfo(&querybuf);
-		fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(fkrelname, fk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
-						 fk_only, fkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
+		ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_RESTRICT_CHECKREF);
+
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-			Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
+			char	   *query;
 
-			quoteOneName(attname,
-						 RIAttName(fk_rel, riinfo->fk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQualComponent(&querybuf, querysep,
-									 paramname, pk_type,
-									 riinfo->pf_eq_oprs[i],
-									 attname, fk_type);
-			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
-				ri_GenerateQualCollation(&querybuf, pk_coll);
-			querysep = "AND";
-			queryoids[i] = pk_type;
+			query = ri_restrict_query(riinfo, fk_rel, pk_rel, false);
+
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, &qkey, fk_rel, pk_rel);
 		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
+	}
+	else
+	{
+		/*
+		 * If another PK row now exists providing the old key values, we
+		 * should not do anything.  However, this check should only be made in
+		 * the NO ACTION case; in RESTRICT cases we don't wish to allow
+		 * another row to be substituted.
+		 */
+		ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_RESTRICT_CHECKREF_NO_ACTION);
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+		{
+
+			char	   *query;
+
+			query = ri_restrict_query(riinfo, fk_rel, pk_rel, true);
+
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, &qkey, fk_rel, pk_rel);
+		}
 	}
 
 	/*
@@ -721,14 +729,17 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	 */
 	if (ri_PerformCheck(riinfo, &qkey, qplan,
 						fk_rel, pk_rel,
-						oldslot, NULL,
 						true,	/* must detect new rows */
 						SPI_OK_SELECT))
+	{
+		TupleTableSlot *violatorslot = get_violator_tuple(pk_rel);
+
 		ri_ReportViolation(riinfo,
 						   pk_rel, fk_rel,
-						   oldslot,
+						   violatorslot,
 						   NULL,
 						   qkey.constr_queryno, false);
+	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
@@ -738,6 +749,76 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	return PointerGetDatum(NULL);
 }
 
+/* ----------
+ * Construct the query to check whether deleted row of the PK table is still
+ * referenced by the FK table.
+ *
+ * If "pk_rel" is NULL, the query string built is
+ *	SELECT o.*
+ *		FROM [ONLY] <fktable> f, tgoldtable o
+ *		WHERE f.fkatt1 = o.pkatt1 [AND ...]
+ *		FOR KEY SHARE OF f
+ *		LIMIT 1
+ *
+ * If no_action is true,also check if the row being deleted was re-inserted
+ * into the PK table (or in case of UPDATE, if row with the old key is there
+ * again):
+ *
+ *	SELECT o.pkatt1 [, ...]
+ *		FROM [ONLY] <fktable> f, tgoldtable o
+ *		WHERE f.fkatt1 = o.pkatt1 [AND ...] AND	NOT EXISTS
+ *			(SELECT 1
+ *			FROM <pktable> p
+ *			WHERE p.pkatt1 = o.pkatt1 [, ...]
+ *			FOR KEY SHARE OF p)
+ *		FOR KEY SHARE OF f
+ *		LIMIT 1
+ *
+ * TODO Is ONLY needed for the the PK table?
+ * ----------
+ */
+static char *
+ri_restrict_query(const RI_ConstraintInfo *riinfo, Relation fk_rel,
+				  Relation pk_rel, bool no_action)
+{
+	StringInfo	querybuf = makeStringInfo();
+	StringInfo	subquerybuf = NULL;
+	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
+	const char *fk_only;
+
+	if (no_action)
+	{
+		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
+
+		subquerybuf = makeStringInfo();
+		quoteRelationName(pkrelname, pk_rel);
+		appendStringInfo(subquerybuf,
+						 "(SELECT 1 FROM %s p WHERE ", pkrelname);
+		ri_GenerateQual(subquerybuf, "AND", riinfo->nkeys,
+						"p", pk_rel, riinfo->pk_attnums,
+						"o", pk_rel, riinfo->pk_attnums,
+						riinfo->pp_eq_oprs);
+		appendStringInfoString(subquerybuf, " FOR KEY SHARE OF p)");
+	}
+
+	fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+		"" : "ONLY ";
+	quoteRelationName(fkrelname, fk_rel);
+	appendStringInfoString(querybuf, "SELECT ");
+	ri_GenerateKeyList(querybuf, riinfo->nkeys, "o", pk_rel,
+					   riinfo->pk_attnums);
+	appendStringInfo(querybuf, " FROM %s%s f, tgoldtable o WHERE ",
+					 fk_only, fkrelname);
+	ri_GenerateQual(querybuf, "AND", riinfo->nkeys,
+					"o", pk_rel, riinfo->pk_attnums,
+					"f", fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs);
+	if (no_action)
+		appendStringInfo(querybuf, " AND NOT EXISTS %s", subquerybuf->data);
+	appendStringInfoString(querybuf, " FOR KEY SHARE OF f LIMIT 1");
+
+	return querybuf->data;
+}
 
 /*
  * RI_FKey_cascade_del -
@@ -751,9 +832,9 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
-	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
+	Tuplestorestate *oldtable;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -769,61 +850,52 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	 */
 	fk_rel = table_open(riinfo->fk_relid, RowExclusiveLock);
 	pk_rel = trigdata->tg_relation;
-	oldslot = trigdata->tg_trigslot;
+
+	oldtable = get_event_tuplestore(trigdata, true, NULL);
 
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
 
+	if (ri_register_trigger_data(trigdata, oldtable, NULL, NULL) !=
+		SPI_OK_TD_REGISTER)
+		elog(ERROR, "ri_register_trigger_data failed");
+
 	/* Fetch or prepare a saved plan for the cascaded delete */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_DEL_DODELETE);
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
+		StringInfo	querybuf = makeStringInfo();
+		StringInfo	subquerybuf = makeStringInfo();
 		char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
 		const char *fk_only;
 
 		/* ----------
 		 * The query string built is
-		 *	DELETE FROM [ONLY] <fktable> WHERE $1 = fkatt1 [AND ...]
-		 * The type id's for the $ parameters are those of the
-		 * corresponding PK attributes.
+		 *
+		 *	DELETE FROM [ONLY] <fktable> f
+		 *	    WHERE EXISTS
+		 *			(SELECT 1
+		 *			FROM tgoldtable o
+		 *			WHERE o.pkatt1 = f.fkatt1 [AND ...])
 		 * ----------
 		 */
-		initStringInfo(&querybuf);
+		appendStringInfoString(subquerybuf,
+							   "SELECT 1 FROM tgoldtable o WHERE ");
+		ri_GenerateQual(subquerybuf, "AND", riinfo->nkeys,
+						"o", pk_rel, riinfo->pk_attnums,
+						"f", fk_rel, riinfo->fk_attnums,
+						riinfo->pf_eq_oprs);
+
 		fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 			"" : "ONLY ";
 		quoteRelationName(fkrelname, fk_rel);
-		appendStringInfo(&querybuf, "DELETE FROM %s%s",
-						 fk_only, fkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-			Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(fk_rel, riinfo->fk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQualComponent(&querybuf, querysep,
-									 paramname, pk_type,
-									 riinfo->pf_eq_oprs[i],
-									 attname, fk_type);
-			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
-				ri_GenerateQualCollation(&querybuf, pk_coll);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
+		appendStringInfo(querybuf,
+						 "DELETE FROM %s%s f WHERE EXISTS (%s) ",
+						 fk_only, fkrelname, subquerybuf->data);
 
 		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
+		qplan = ri_PlanCheck(querybuf->data, &qkey, fk_rel, pk_rel);
 	}
 
 	/*
@@ -832,7 +904,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	 */
 	ri_PerformCheck(riinfo, &qkey, qplan,
 					fk_rel, pk_rel,
-					oldslot, NULL,
 					true,		/* must detect new rows */
 					SPI_OK_DELETE);
 
@@ -857,10 +928,9 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
-	TupleTableSlot *newslot;
-	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
+	Tuplestorestate *table;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -877,12 +947,22 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	 */
 	fk_rel = table_open(riinfo->fk_relid, RowExclusiveLock);
 	pk_rel = trigdata->tg_relation;
-	newslot = trigdata->tg_newslot;
-	oldslot = trigdata->tg_trigslot;
+
+	/*
+	 * In this case, both old and new values should be in the same tuplestore
+	 * because there's no useful join column.
+	 */
+	table = get_event_tuplestore_for_cascade_update(trigdata, riinfo);
 
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
 
+	/* Here it doesn't matter whether we call the table "old" or "new". */
+	if (ri_register_trigger_data(trigdata, NULL, table,
+								 riinfo->desc_pk_both) !=
+		SPI_OK_TD_REGISTER)
+		elog(ERROR, "ri_register_trigger_data failed");
+
 	/* Fetch or prepare a saved plan for the cascaded update */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_UPD_DOUPDATE);
 
@@ -891,21 +971,20 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		StringInfoData querybuf;
 		StringInfoData qualbuf;
 		char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *qualsep;
-		Oid			queryoids[RI_MAX_NUMKEYS * 2];
 		const char *fk_only;
+		int			i;
 
 		/* ----------
 		 * The query string built is
-		 *	UPDATE [ONLY] <fktable> SET fkatt1 = $1 [, ...]
-		 *			WHERE $n = fkatt1 [AND ...]
-		 * The type id's for the $ parameters are those of the
-		 * corresponding PK attributes.  Note that we are assuming
-		 * there is an assignment cast from the PK to the FK type;
-		 * else the parser will fail.
+		 *
+		 * UPDATE [ONLY] <fktable> f
+		 *     SET fkatt1 = n.pkatt1_new [, ...]
+		 *     FROM tgnewtable n
+		 *     WHERE
+		 *         f.fkatt1 = n.pkatt1_old [AND ...]
+		 *
+		 * Note that we are assuming there is an assignment cast from the PK
+		 * to the FK type; else the parser will fail.
 		 * ----------
 		 */
 		initStringInfo(&querybuf);
@@ -913,39 +992,43 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 		fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 			"" : "ONLY ";
 		quoteRelationName(fkrelname, fk_rel);
-		appendStringInfo(&querybuf, "UPDATE %s%s SET",
-						 fk_only, fkrelname);
-		querysep = "";
-		qualsep = "WHERE";
-		for (int i = 0, j = riinfo->nkeys; i < riinfo->nkeys; i++, j++)
+		appendStringInfo(&querybuf, "UPDATE %s%s f SET ", fk_only, fkrelname);
+
+		for (i = 0; i < riinfo->nkeys; i++)
 		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-			Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
+			char	   *latt = ri_ColNameQuoted("", RIAttName(fk_rel, riinfo->fk_attnums[i]));
+			Oid			lcoll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
+			char		ratt[NAMEDATALEN];
+			Oid			rcoll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
 
-			quoteOneName(attname,
-						 RIAttName(fk_rel, riinfo->fk_attnums[i]));
-			appendStringInfo(&querybuf,
-							 "%s %s = $%d",
-							 querysep, attname, i + 1);
-			sprintf(paramname, "$%d", j + 1);
-			ri_GenerateQualComponent(&qualbuf, qualsep,
-									 paramname, pk_type,
-									 riinfo->pf_eq_oprs[i],
-									 attname, fk_type);
-			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
-				ri_GenerateQualCollation(&querybuf, pk_coll);
-			querysep = ",";
-			qualsep = "AND";
-			queryoids[i] = pk_type;
-			queryoids[j] = pk_type;
+			snprintf(ratt, NAMEDATALEN, "n.pkatt%d_new", i + 1);
+
+			if (i > 0)
+				appendStringInfoString(&querybuf, ", ");
+
+			appendStringInfo(&querybuf, "%s = %s", latt, ratt);
+
+			if (lcoll != rcoll)
+				ri_GenerateQualCollation(&querybuf, lcoll);
+		}
+
+		appendStringInfo(&querybuf, " FROM tgnewtable n WHERE");
+
+		for (i = 0; i < riinfo->nkeys; i++)
+		{
+			char	   *fattname;
+
+			if (i > 0)
+				appendStringInfoString(&querybuf, " AND");
+
+			fattname =
+				ri_ColNameQuoted("f",
+								 RIAttName(fk_rel, riinfo->fk_attnums[i]));
+			appendStringInfo(&querybuf, " %s = n.pkatt%d_old", fattname, i + 1);
 		}
-		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
 
 		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
-							 &qkey, fk_rel, pk_rel);
+		qplan = ri_PlanCheck(querybuf.data, &qkey, fk_rel, pk_rel);
 	}
 
 	/*
@@ -953,7 +1036,6 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	 */
 	ri_PerformCheck(riinfo, &qkey, qplan,
 					fk_rel, pk_rel,
-					oldslot, newslot,
 					true,		/* must detect new rows */
 					SPI_OK_UPDATE);
 
@@ -1038,9 +1120,9 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
-	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
+	Tuplestorestate *oldtable;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -1053,11 +1135,16 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 	 */
 	fk_rel = table_open(riinfo->fk_relid, RowExclusiveLock);
 	pk_rel = trigdata->tg_relation;
-	oldslot = trigdata->tg_trigslot;
+
+	oldtable = get_event_tuplestore(trigdata, true, NULL);
 
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
 
+	if (ri_register_trigger_data(trigdata, oldtable, NULL, NULL) !=
+		SPI_OK_TD_REGISTER)
+		elog(ERROR, "ri_register_trigger_data failed");
+
 	/*
 	 * Fetch or prepare a saved plan for the set null/default operation (it's
 	 * the same query for delete and update cases)
@@ -1070,38 +1157,28 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
 		StringInfoData querybuf;
-		StringInfoData qualbuf;
 		char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
 		const char *querysep;
-		const char *qualsep;
-		Oid			queryoids[RI_MAX_NUMKEYS];
 		const char *fk_only;
 
 		/* ----------
 		 * The query string built is
-		 *	UPDATE [ONLY] <fktable> SET fkatt1 = {NULL|DEFAULT} [, ...]
-		 *			WHERE $1 = fkatt1 [AND ...]
-		 * The type id's for the $ parameters are those of the
-		 * corresponding PK attributes.
+		 *	UPDATE [ONLY] <fktable> f
+		 *	    SET fkatt1 = {NULL|DEFAULT} [, ...]
+		 *	    FROM tgoldtable o
+		 *		WHERE o.pkatt1 = f.fkatt1 [AND ...]
 		 * ----------
 		 */
 		initStringInfo(&querybuf);
-		initStringInfo(&qualbuf);
 		fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 			"" : "ONLY ";
 		quoteRelationName(fkrelname, fk_rel);
-		appendStringInfo(&querybuf, "UPDATE %s%s SET",
+		appendStringInfo(&querybuf, "UPDATE %s%s f SET",
 						 fk_only, fkrelname);
 		querysep = "";
-		qualsep = "WHERE";
 		for (int i = 0; i < riinfo->nkeys; i++)
 		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-			Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
+			char		attname[MAX_QUOTED_NAME_LEN];
 
 			quoteOneName(attname,
 						 RIAttName(fk_rel, riinfo->fk_attnums[i]));
@@ -1109,22 +1186,17 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 							 "%s %s = %s",
 							 querysep, attname,
 							 is_set_null ? "NULL" : "DEFAULT");
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQualComponent(&qualbuf, qualsep,
-									 paramname, pk_type,
-									 riinfo->pf_eq_oprs[i],
-									 attname, fk_type);
-			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
-				ri_GenerateQualCollation(&querybuf, pk_coll);
 			querysep = ",";
-			qualsep = "AND";
-			queryoids[i] = pk_type;
 		}
-		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
+
+		appendStringInfoString(&querybuf, " FROM tgoldtable o WHERE ");
+		ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
+						"o", pk_rel, riinfo->pk_attnums,
+						"f", fk_rel, riinfo->fk_attnums,
+						riinfo->pf_eq_oprs);
 
 		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
+		qplan = ri_PlanCheck(querybuf.data, &qkey, fk_rel, pk_rel);
 	}
 
 	/*
@@ -1132,7 +1204,6 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 	 */
 	ri_PerformCheck(riinfo, &qkey, qplan,
 					fk_rel, pk_rel,
-					oldslot, NULL,
 					true,		/* must detect new rows */
 					SPI_OK_UPDATE);
 
@@ -1542,7 +1613,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   RI_PLAN_CHECK_LOOKUPPK_INS, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -1847,6 +1918,25 @@ ri_GenerateQualComponent(StringInfo buf,
 							 rightop, rightoptype);
 }
 
+/*
+ * ri_GenerateKeyList --- generate comma-separated list of key attributes.
+ */
+static void
+ri_GenerateKeyList(StringInfo buf, int nkeys,
+				   const char *tabname, Relation rel,
+				   const int16 *attnums)
+{
+	for (int i = 0; i < nkeys; i++)
+	{
+		char	   *att = ri_ColNameQuoted(tabname, RIAttName(rel, attnums[i]));
+
+		if (i > 0)
+			appendStringInfoString(buf, ", ");
+
+		appendStringInfoString(buf, att);
+	}
+}
+
 /*
  * ri_ColNameQuoted() --- return column name, with both table and column name
  * quoted.
@@ -2007,7 +2097,7 @@ ri_FetchConstraintInfo(Trigger *trigger, Relation trig_rel, bool rel_is_pk)
 				 errhint("Remove this referential integrity trigger and its mates, then do ALTER TABLE ADD CONSTRAINT.")));
 
 	/* Find or create a hashtable entry for the constraint */
-	riinfo = ri_LoadConstraintInfo(constraintOid);
+	riinfo = ri_LoadConstraintInfo(constraintOid, trig_rel, rel_is_pk);
 
 	/* Do some easy cross-checks against the trigger call data */
 	if (rel_is_pk)
@@ -2043,12 +2133,15 @@ ri_FetchConstraintInfo(Trigger *trigger, Relation trig_rel, bool rel_is_pk)
  * Fetch or create the RI_ConstraintInfo struct for an FK constraint.
  */
 static const RI_ConstraintInfo *
-ri_LoadConstraintInfo(Oid constraintOid)
+ri_LoadConstraintInfo(Oid constraintOid, Relation trig_rel, bool rel_is_pk)
 {
 	RI_ConstraintInfo *riinfo;
 	bool		found;
 	HeapTuple	tup;
 	Form_pg_constraint conForm;
+	MemoryContext oldcxt;
+	TupleDesc	tupdesc_both;
+	Relation	pk_rel;
 
 	/*
 	 * On the first call initialize the hashtable
@@ -2100,6 +2193,24 @@ ri_LoadConstraintInfo(Oid constraintOid)
 
 	ReleaseSysCache(tup);
 
+	/*
+	 * Construct the descriptor to store both OLD and NEW tuple into when
+	 * processing ON UPDATE CASCADE. Only key columns are needed for that.
+	 */
+	if (rel_is_pk)
+		pk_rel = trig_rel;
+	else
+		pk_rel = table_open(riinfo->pk_relid, AccessShareLock);
+	oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+	tupdesc_both = CreateTemplateTupleDesc(2 * riinfo->nkeys);
+	/* Add the key attributes for both OLD and NEW. */
+	add_key_attrs_to_tupdesc(tupdesc_both, pk_rel, riinfo, true);
+	add_key_attrs_to_tupdesc(tupdesc_both, pk_rel, riinfo, false);
+	MemoryContextSwitchTo(oldcxt);
+	riinfo->desc_pk_both = tupdesc_both;
+	if (!rel_is_pk)
+		table_close(pk_rel, AccessShareLock);
+
 	/*
 	 * For efficient processing of invalidation messages below, we keep a
 	 * doubly-linked list, and a count, of all currently valid entries.
@@ -2165,8 +2276,8 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
  * so that we don't need to plan it again.
  */
 static SPIPlanPtr
-ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
+ri_PlanCheck(const char *querystr, RI_QueryKey *qkey, Relation fk_rel,
+			 Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
 	Relation	query_rel;
@@ -2189,7 +2300,7 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 						   SECURITY_NOFORCE_RLS);
 
 	/* Create the plan */
-	qplan = SPI_prepare(querystr, nargs, argtypes);
+	qplan = SPI_prepare(querystr, 0, NULL);
 
 	if (qplan == NULL)
 		elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
@@ -2211,20 +2322,15 @@ static bool
 ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				RI_QueryKey *qkey, SPIPlanPtr qplan,
 				Relation fk_rel, Relation pk_rel,
-				TupleTableSlot *oldslot, TupleTableSlot *newslot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	Relation	query_rel;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
 	int			spi_result;
 	Oid			save_userid;
 	int			save_sec_context;
-	Datum		vals[RI_MAX_NUMKEYS * 2];
-	char		nulls[RI_MAX_NUMKEYS * 2];
 
 	/*
 	 * Use the query type code to determine whether the query is run against
@@ -2235,39 +2341,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	else
 		query_rel = fk_rel;
 
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
-	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
-
-	/* Extract the parameters to be passed into the query */
-	if (newslot)
-	{
-		ri_ExtractValues(source_rel, newslot, riinfo, source_is_pk,
-						 vals, nulls);
-		if (oldslot)
-			ri_ExtractValues(source_rel, oldslot, riinfo, source_is_pk,
-							 vals + riinfo->nkeys, nulls + riinfo->nkeys);
-	}
-	else
-	{
-		ri_ExtractValues(source_rel, oldslot, riinfo, source_is_pk,
-						 vals, nulls);
-	}
-
 	/*
 	 * In READ COMMITTED mode, we just need to use an up-to-date regular
 	 * snapshot, and we will see all rows that could be interesting. But in
@@ -2308,7 +2381,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 
 	/* Finally we can run the query. */
 	spi_result = SPI_execute_snapshot(qplan,
-									  vals, nulls,
+									  NULL, NULL,
 									  test_snapshot, crosscheck_snapshot,
 									  false, false, limit);
 
@@ -2328,30 +2401,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						RelationGetRelationName(fk_rel)),
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
-	return SPI_processed != 0;
-}
-
-/*
- * Extract fields from a tuple into Datum/nulls arrays
- */
-static void
-ri_ExtractValues(Relation rel, TupleTableSlot *slot,
-				 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
-				 Datum *vals, char *nulls)
-{
-	const int16 *attnums;
-	bool		isnull;
-
-	if (rel_is_pk)
-		attnums = riinfo->pk_attnums;
-	else
-		attnums = riinfo->fk_attnums;
-
-	for (int i = 0; i < riinfo->nkeys; i++)
-	{
-		vals[i] = slot_getattr(slot, attnums[i], &isnull);
-		nulls[i] = isnull ? 'n' : ' ';
-	}
+	return SPI_processed > 0;
 }
 
 /*
@@ -2378,25 +2428,28 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * Determine which relation to complain about.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
+	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK_INS ||
+			queryno == RI_PLAN_CHECK_LOOKUPPK_UPD);
 	if (onfk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
-		if (tupdesc == NULL)
-			tupdesc = fk_rel->rd_att;
 	}
 	else
 	{
 		attnums = riinfo->pk_attnums;
 		rel_oid = pk_rel->rd_id;
-		if (tupdesc == NULL)
-			tupdesc = pk_rel->rd_att;
 	}
 
+	/*
+	 * If tupdesc wasn't passed by caller, assume the violator tuple matches
+	 * the descriptor of the violatorslot.
+	 */
+	if (tupdesc == NULL)
+		tupdesc = violatorslot->tts_tupleDescriptor;
+
 	/*
 	 * Check permissions- if the user does not have access to view the data in
 	 * any of the key columns then we don't include the errdetail() below.
@@ -2443,8 +2496,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 		initStringInfo(&key_values);
 		for (int idx = 0; idx < riinfo->nkeys; idx++)
 		{
-			int			fnum = attnums[idx];
-			Form_pg_attribute att = TupleDescAttr(tupdesc, fnum - 1);
+			Form_pg_attribute att = TupleDescAttr(tupdesc, idx);
 			char	   *name,
 					   *val;
 			Datum		datum;
@@ -2452,7 +2504,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 
 			name = NameStr(att->attname);
 
-			datum = slot_getattr(violatorslot, fnum, &isnull);
+			datum = slot_getattr(violatorslot, idx + 1, &isnull);
 			if (!isnull)
 			{
 				Oid			foutoid;
@@ -2921,3 +2973,293 @@ RI_FKey_trigger_type(Oid tgfoid)
 
 	return RI_TRIGGER_NONE;
 }
+
+/*
+ * Wrapper around SPI_register_trigger_data() that lets us register the RI
+ * trigger tuplestores w/o having to set tg_oldtable/tg_newtable and also w/o
+ * having to set tgoldtable/tgnewtable in pg_trigger.
+ *
+ * XXX This is rather a hack, try to invent something better.
+ */
+static int
+ri_register_trigger_data(TriggerData *tdata, Tuplestorestate *oldtable,
+						 Tuplestorestate *newtable, TupleDesc desc)
+{
+	TriggerData *td = (TriggerData *) palloc(sizeof(TriggerData));
+	Trigger    *tg = (Trigger *) palloc(sizeof(Trigger));
+	int			result;
+
+	Assert(tdata->tg_trigger->tgoldtable == NULL &&
+		   tdata->tg_trigger->tgnewtable == NULL);
+
+	*td = *tdata;
+
+	td->tg_oldtable = oldtable;
+	td->tg_newtable = newtable;
+
+	*tg = *tdata->tg_trigger;
+	tg->tgoldtable = pstrdup("tgoldtable");
+	tg->tgnewtable = pstrdup("tgnewtable");
+	td->tg_trigger = tg;
+	td->desc = desc;
+
+	result = SPI_register_trigger_data(td);
+
+	return result;
+}
+
+/*
+ * Turn TID array into a tuplestore. If snapshot is passed, only use tuples
+ * visible by this snapshot.
+ */
+static Tuplestorestate *
+get_event_tuplestore(TriggerData *trigdata, bool old, Snapshot snapshot)
+{
+	ResourceOwner saveResourceOwner;
+	Tuplestorestate *result;
+	TIDArray   *ta;
+	ItemPointer it;
+	TupleTableSlot *slot;
+	int			i;
+
+	saveResourceOwner = CurrentResourceOwner;
+	CurrentResourceOwner = CurTransactionResourceOwner;
+	result = tuplestore_begin_heap(false, false, work_mem);
+	CurrentResourceOwner = saveResourceOwner;
+
+	if (old)
+	{
+		ta = trigdata->ri_tids_old;
+		slot = trigdata->tg_trigslot;
+	}
+	else
+	{
+		ta = trigdata->ri_tids_new;
+		slot = trigdata->tg_newslot;
+	}
+
+	it = ta->tids;
+	for (i = 0; i < ta->n; i++)
+	{
+		ExecClearTuple(slot);
+
+		if (!table_tuple_fetch_row_version(trigdata->tg_relation, it,
+										   SnapshotAny, slot))
+		{
+			const char *tuple_kind = old ? "tuple1" : "tuple2";
+
+			elog(ERROR, "failed to fetch %s for AFTER trigger", tuple_kind);
+		}
+
+		if (snapshot)
+		{
+			if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, slot,
+												snapshot))
+				continue;
+
+			/*
+			 * In fact the snapshot is passed iff the slot contains a tuple of
+			 * the FK table being inserted / updated, so perform one more test
+			 * before we add the tuple to the tuplestore. Otherwise we might
+			 * need to remove the tuple later, which would effectively mean to
+			 * create a new tuplestore and put only a subset of tuples into
+			 * it.
+			 */
+			if (!RI_FKey_check_query_required(trigdata->tg_trigger,
+											  trigdata->tg_relation, slot))
+				continue;
+		}
+
+		/*
+		 * TODO  Only store the key attributes.
+		 */
+		tuplestore_puttupleslot(result, slot);
+		it++;
+	}
+
+	return result;
+}
+
+/*
+ * Like get_event_tuplestore(), but put both old and new key values into the
+ * same tuple. If the query (see RI_FKey_cascade_upd) used two tuplestores, it
+ * whould have to join them somehow, but there's not suitable join column.
+ */
+static Tuplestorestate *
+get_event_tuplestore_for_cascade_update(TriggerData *trigdata,
+										const RI_ConstraintInfo *riinfo)
+{
+	ResourceOwner saveResourceOwner;
+	Tuplestorestate *result;
+	TIDArray   *ta_old,
+			   *ta_new;
+	ItemPointer it_old,
+				it_new;
+	TupleTableSlot *slot_old,
+			   *slot_new;
+	int			i;
+	Datum	   *values,
+			   *key_values;
+	bool	   *nulls,
+			   *key_nulls;
+	MemoryContext tuple_context;
+	Relation	rel = trigdata->tg_relation;
+	TupleDesc	desc_rel = RelationGetDescr(rel);
+
+	saveResourceOwner = CurrentResourceOwner;
+	CurrentResourceOwner = CurTransactionResourceOwner;
+	result = tuplestore_begin_heap(false, false, work_mem);
+	CurrentResourceOwner = saveResourceOwner;
+
+	/*
+	 * This context will be used for the contents of "values".
+	 *
+	 * CurrentMemoryContext should be the "batch context", as passed to
+	 * AfterTriggerExecuteRI().
+	 */
+	tuple_context =
+		AllocSetContextCreate(CurrentMemoryContext,
+							  "AfterTriggerCascadeUpdateContext",
+							  ALLOCSET_DEFAULT_SIZES);
+
+	ta_old = trigdata->ri_tids_old;
+	ta_new = trigdata->ri_tids_new;
+	Assert(ta_old->n == ta_new->n);
+
+	slot_old = trigdata->tg_trigslot;
+	slot_new = trigdata->tg_newslot;
+
+	key_values = (Datum *) palloc(riinfo->nkeys * 2 * sizeof(Datum));
+	key_nulls = (bool *) palloc(riinfo->nkeys * 2 * sizeof(bool));
+	values = (Datum *) palloc(desc_rel->natts * sizeof(Datum));
+	nulls = (bool *) palloc(desc_rel->natts * sizeof(bool));
+
+	it_old = ta_old->tids;
+	it_new = ta_new->tids;
+	for (i = 0; i < ta_old->n; i++)
+	{
+		MemoryContext oldcxt;
+
+		MemoryContextReset(tuple_context);
+		oldcxt = MemoryContextSwitchTo(tuple_context);
+
+		/* Add values of the PK table, followed by the FK ones. */
+		add_key_values(slot_old, riinfo, trigdata->tg_relation, it_old,
+					   key_values, key_nulls, values, nulls, true);
+		add_key_values(slot_new, riinfo, trigdata->tg_relation, it_new,
+					   key_values, key_nulls, values, nulls, false);
+		MemoryContextSwitchTo(oldcxt);
+
+		tuplestore_putvalues(result, riinfo->desc_pk_both, key_values,
+							 key_nulls);
+
+		it_old++;
+		it_new++;
+	}
+	MemoryContextDelete(tuple_context);
+
+	return result;
+}
+
+/*
+ * Subroutine of get_event_tuplestore_for_cascade_update(), to add key
+ * attributes of the OLD or the NEW table to tuple descriptor.
+ *
+ */
+static void
+add_key_attrs_to_tupdesc(TupleDesc tupdesc, Relation rel,
+						 const RI_ConstraintInfo *riinfo, bool old)
+{
+	int			i,
+				first;
+	const char *kind;
+
+	first = old ? 1 : riinfo->nkeys + 1;
+	kind = old ? "old" : "new";
+
+	for (i = 0; i < riinfo->nkeys; i++)
+	{
+		int16		attnum;
+		Oid			atttypid;
+		char		attname[NAMEDATALEN];
+		Form_pg_attribute att;
+
+		attnum = riinfo->pk_attnums[i];
+		atttypid = RIAttType(rel, attnum);
+
+		/*
+		 * Generate unique names instead of e.g. using prefix to distinguish
+		 * the old values from new ones. The prefix might be a problem due to
+		 * the limited attribute name length.
+		 */
+		snprintf(attname, NAMEDATALEN, "pkatt%d_%s", i + 1, kind);
+
+		att = tupdesc->attrs;
+		TupleDescInitEntry(tupdesc, first + i, attname, atttypid,
+						   att->atttypmod, att->attndims);
+		att++;
+	}
+}
+
+/*
+ * Retrieve tuple using given slot, deform it and add the attribute values to
+ * "key_values" and "key_null" arrays. "values" and "nulls" is a workspace to
+ * deform the tuple into. "old" tells whether the slot contains data for the
+ * OLD table or for the NEW one.
+ */
+static void
+add_key_values(TupleTableSlot *slot, const RI_ConstraintInfo *riinfo,
+			   Relation rel, ItemPointer ip,
+			   Datum *key_values, bool *key_nulls,
+			   Datum *values, bool *nulls, bool old)
+{
+	HeapTuple	tuple;
+	bool		shouldfree;
+	int			i,
+				n;
+
+	ExecClearTuple(slot);
+	if (!table_tuple_fetch_row_version(rel, ip, SnapshotAny, slot))
+	{
+		const char *tuple_kind = old ? "tuple1" : "tuple2";
+
+		elog(ERROR, "failed to fetch %s for AFTER trigger", tuple_kind);
+	}
+	tuple = ExecFetchSlotHeapTuple(slot, false, &shouldfree);
+
+	heap_deform_tuple(tuple, slot->tts_tupleDescriptor, values, nulls);
+
+	/* Where to start in the output arrays? */
+	n = old ? 0 : riinfo->nkeys;
+
+	/* Pick the key values and store them in the output arrays. */
+	for (i = 0; i < riinfo->nkeys; i++)
+	{
+		int16		attnum = riinfo->pk_attnums[i];
+
+		key_values[n] = values[attnum - 1];
+		key_nulls[n] = nulls[attnum - 1];
+		n++;
+	}
+
+	if (shouldfree)
+		pfree(tuple);
+}
+
+
+/*
+ * Retrieve the row that violates RI constraint and return it in a tuple slot.
+ */
+static TupleTableSlot *
+get_violator_tuple(Relation rel)
+{
+	HeapTuple	tuple;
+	TupleTableSlot *slot;
+
+	Assert(SPI_tuptable && SPI_tuptable->numvals == 1);
+
+	tuple = SPI_tuptable->vals[0];
+	slot = MakeSingleTupleTableSlot(SPI_tuptable->tupdesc, &TTSOpsHeapTuple);
+	ExecStoreHeapTuple(tuple, slot, false);
+	return slot;
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a40ddf5db5..70c214f069 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -27,19 +27,42 @@
 
 typedef uint32 TriggerEvent;
 
+/*
+ * An intermediate storage for TIDs, in order to process multiple events by a
+ * single call of RI trigger.
+ *
+ * XXX Introduce a size limit and make caller of add_tid() aware of it?
+ */
+typedef struct TIDArray
+{
+	ItemPointerData *tids;
+	uint64		n;
+	uint64		nmax;
+} TIDArray;
+
 typedef struct TriggerData
 {
 	NodeTag		type;
+	int			tgindx;
 	TriggerEvent tg_event;
 	Relation	tg_relation;
 	HeapTuple	tg_trigtuple;
 	HeapTuple	tg_newtuple;
 	Trigger    *tg_trigger;
+	bool		is_ri_trigger;
 	TupleTableSlot *tg_trigslot;
 	TupleTableSlot *tg_newslot;
 	Tuplestorestate *tg_oldtable;
 	Tuplestorestate *tg_newtable;
 	const Bitmapset *tg_updatedcols;
+
+	TupleDesc	desc;
+
+	/*
+	 * RI triggers receive TIDs and retrieve the tuples before they're needed.
+	 */
+	TIDArray   *ri_tids_old;
+	TIDArray   *ri_tids_new;
 } TriggerData;
 
 /*
-- 
2.20.1

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Antonin Houska (#1)
Re: More efficient RI checks - take 2

st 8. 4. 2020 v 18:36 odesílatel Antonin Houska <ah@cybertec.at> napsal:

After having reviewed [1] more than a year ago (the problem I found was
that
the transient table is not available for deferred constraints), I've tried
to
implement the same in an alternative way. The RI triggers still work as row
level triggers, but if multiple events of the same kind appear in the
queue,
they are all passed to the trigger function at once. Thus the check query
does
not have to be executed that frequently.

Some performance comparisons are below. (Besides the execution time, please
note the difference in the number of trigger function executions.) In
general,
the checks are significantly faster if there are many rows to process, and
a
bit slower when we only need to check a single row. However I'm not sure
about
the accuracy if only a single row is measured (if a single row check is
performed several times, the execution time appears to fluctuate).

It is hard task to choose good strategy for immediate constraints, but for
deferred constraints you know how much rows should be checked, and then you
can choose better strategy.

Is possible to use estimation for choosing method of RI checks?

Show quoted text

Comments are welcome.

Setup
=====

CREATE TABLE p(i int primary key);
INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x);
CREATE TABLE f(i int REFERENCES p);

Insert many rows into the FK table
==================================

master:

EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Insert on f (cost=0.00..163.84 rows=16384 width=4) (actual
time=32.741..32.741 rows=0 loops=1)
-> Function Scan on generate_series g (cost=0.00..163.84 rows=16384
width=4) (actual time=2.403..4.802 rows=16384 loops=1)
Planning Time: 0.050 ms
Trigger for constraint f_i_fkey: time=448.986 calls=16384
Execution Time: 485.444 ms
(5 rows)

patched:

EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Insert on f (cost=0.00..163.84 rows=16384 width=4) (actual
time=34.053..34.053 rows=0 loops=1)
-> Function Scan on generate_series g (cost=0.00..163.84 rows=16384
width=4) (actual time=2.223..4.448 rows=16384 loops=1)
Planning Time: 0.047 ms
Trigger for constraint f_i_fkey: time=105.164 calls=8
Execution Time: 141.201 ms

Insert a single row into the FK table
=====================================

master:

EXPLAIN ANALYZE INSERT INTO f VALUES (1);
QUERY PLAN

------------------------------------------------------------------------------------------
Insert on f (cost=0.00..0.01 rows=1 width=4) (actual time=0.060..0.060
rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
rows=1 loops=1)
Planning Time: 0.026 ms
Trigger for constraint f_i_fkey: time=0.435 calls=1
Execution Time: 0.517 ms
(5 rows)

patched:

EXPLAIN ANALYZE INSERT INTO f VALUES (1);
QUERY PLAN

------------------------------------------------------------------------------------------
Insert on f (cost=0.00..0.01 rows=1 width=4) (actual time=0.066..0.066
rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
rows=1 loops=1)
Planning Time: 0.025 ms
Trigger for constraint f_i_fkey: time=0.578 calls=1
Execution Time: 0.670 ms

Check if FK row exists during deletion from the PK
==================================================

master:

DELETE FROM p WHERE i=16384;
ERROR: update or delete on table "p" violates foreign key constraint
"f_i_fkey" on table "f"
DETAIL: Key (i)=(16384) is still referenced from table "f".
Time: 3.381 ms

patched:

DELETE FROM p WHERE i=16384;
ERROR: update or delete on table "p" violates foreign key constraint
"f_i_fkey" on table "f"
DETAIL: Key (i)=(16384) is still referenced from table "f".
Time: 5.561 ms

Cascaded DELETE --- many PK rows
================================

DROP TABLE f;
CREATE TABLE f(i int REFERENCES p ON UPDATE CASCADE ON DELETE CASCADE);
INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);

master:

EXPLAIN ANALYZE DELETE FROM p;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------
Delete on p (cost=0.00..236.84 rows=16384 width=6) (actual
time=38.334..38.334 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..236.84 rows=16384 width=6) (actual
time=0.019..3.925 rows=16384 loops=1)
Planning Time: 0.049 ms
Trigger for constraint f_i_fkey: time=31348.756 calls=16384
Execution Time: 31390.784 ms

patched:

EXPLAIN ANALYZE DELETE FROM p;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------
Delete on p (cost=0.00..236.84 rows=16384 width=6) (actual
time=33.360..33.360 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..236.84 rows=16384 width=6) (actual
time=0.012..3.183 rows=16384 loops=1)
Planning Time: 0.094 ms
Trigger for constraint f_i_fkey: time=9.580 calls=8
Execution Time: 43.941 ms

Cascaded DELETE --- a single PK row
===================================

INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x);
INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);

master:

DELETE FROM p WHERE i=16384;
DELETE 1
Time: 5.754 ms

patched:

DELETE FROM p WHERE i=16384;
DELETE 1
Time: 8.098 ms

Cascaded UPDATE - many rows
===========================

master:

EXPLAIN ANALYZE UPDATE p SET i = i + 16384;
QUERY PLAN

------------------------------------------------------------------------------------------------------------
Update on p (cost=0.00..277.80 rows=16384 width=10) (actual
time=166.954..166.954 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..277.80 rows=16384 width=10) (actual
time=0.013..7.780 rows=16384 loops=1)
Planning Time: 0.177 ms
Trigger for constraint f_i_fkey on p: time=60405.362 calls=16384
Trigger for constraint f_i_fkey on f: time=455.874 calls=16384
Execution Time: 61036.996 ms

patched:

EXPLAIN ANALYZE UPDATE p SET i = i + 16384;
QUERY PLAN

------------------------------------------------------------------------------------------------------------
Update on p (cost=0.00..277.77 rows=16382 width=10) (actual
time=159.512..159.512 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..277.77 rows=16382 width=10) (actual
time=0.014..7.783 rows=16382 loops=1)
Planning Time: 0.146 ms
Trigger for constraint f_i_fkey on p: time=169.628 calls=9
Trigger for constraint f_i_fkey on f: time=124.079 calls=2
Execution Time: 456.072 ms

Cascaded UPDATE - a single row
==============================

master:

UPDATE p SET i = i - 16384 WHERE i=32767;
UPDATE 1
Time: 4.858 ms

patched:

UPDATE p SET i = i - 16384 WHERE i=32767;
UPDATE 1
Time: 11.955 ms

[1] https://commitfest.postgresql.org/22/1975/

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#3Corey Huinker
corey.huinker@gmail.com
In reply to: Pavel Stehule (#2)
Re: More efficient RI checks - take 2

On Wed, Apr 8, 2020 at 1:06 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

st 8. 4. 2020 v 18:36 odesílatel Antonin Houska <ah@cybertec.at> napsal:

After having reviewed [1] more than a year ago (the problem I found was
that
the transient table is not available for deferred constraints), I've
tried to
implement the same in an alternative way. The RI triggers still work as
row
level triggers, but if multiple events of the same kind appear in the
queue,
they are all passed to the trigger function at once. Thus the check query
does
not have to be executed that frequently.

I'm excited that you picked this up!

Some performance comparisons are below. (Besides the execution time,
please
note the difference in the number of trigger function executions.) In
general,
the checks are significantly faster if there are many rows to process,
and a
bit slower when we only need to check a single row. However I'm not sure
about
the accuracy if only a single row is measured (if a single row check is
performed several times, the execution time appears to fluctuate).

These numbers are very promising, and much more in line with my initial
expectations. Obviously the impact on single-row DML is of major concern,
though.

It is hard task to choose good strategy for immediate constraints, but for

deferred constraints you know how much rows should be checked, and then you
can choose better strategy.

Is possible to use estimation for choosing method of RI checks?

In doing my initial attempt, the feedback I was getting was that the people
who truly understood the RI checks fell into the following groups:
1. people who wanted to remove the SPI calls from the triggers
2. people who wanted to completely refactor RI to not use triggers
3. people who wanted to completely refactor triggers

While #3 is clearly beyond the scope for an endeavor like this, #1 seems
like it would nearly eliminate the 1-row penalty (we'd still have the
TupleStore initi penalty, but it would just be a handy queue structure, and
maybe that cost would be offset by removing the SPI overhead), and once
that is done, we could see about step #2.

#4Antonin Houska
ah@cybertec.at
In reply to: Pavel Stehule (#2)
Re: More efficient RI checks - take 2

Pavel Stehule <pavel.stehule@gmail.com> wrote:

st 8. 4. 2020 v 18:36 odesílatel Antonin Houska <ah@cybertec.at> napsal:

Some performance comparisons are below. (Besides the execution time, please
note the difference in the number of trigger function executions.) In general,
the checks are significantly faster if there are many rows to process, and a
bit slower when we only need to check a single row. However I'm not sure about
the accuracy if only a single row is measured (if a single row check is
performed several times, the execution time appears to fluctuate).

It is hard task to choose good strategy for immediate constraints, but for
deferred constraints you know how much rows should be checked, and then you
can choose better strategy.

Is possible to use estimation for choosing method of RI checks?

The exact number of rows ("batch size") is always known before the query is
executed. So one problem to solve is that, when only one row is affected, we
need to convince the planner that the "transient table" really contains a
single row. Otherwise it can, for example, produce a hash join where the hash
eventually contains a single row.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#5Antonin Houska
ah@cybertec.at
In reply to: Corey Huinker (#3)
Re: More efficient RI checks - take 2

Corey Huinker <corey.huinker@gmail.com> wrote:

These numbers are very promising, and much more in line with my initial
expectations. Obviously the impact on single-row DML is of major concern,
though.

Yes, I agree.

In doing my initial attempt, the feedback I was getting was that the people
who truly understood the RI checks fell into the following groups:

1. people who wanted to remove the SPI calls from the triggers
2. people who wanted to completely refactor RI to not use triggers
3. people who wanted to completely refactor triggers

While #3 is clearly beyond the scope for an endeavor like this, #1 seems
like it would nearly eliminate the 1-row penalty (we'd still have the
TupleStore initi penalty, but it would just be a handy queue structure, and
maybe that cost would be offset by removing the SPI overhead),

I can imagine removal of the SPI from the current implementation (and
constructing the plans "manually"), but note that the queries I use in my
patch are no longer that trivial. So the SPI makes sense to me because it
ensures regular query planning.

As for the tuplestore, I'm not sure the startup cost is a problem: if you're
concerned about the 1-row case, the row should usually be stored in memory.

and once that is done, we could see about step #2.

As I said during my review of your patch last year, I think the RI semantics
has too much in common with that of triggers. I'd need more info to imagine
such a change.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#6Corey Huinker
corey.huinker@gmail.com
In reply to: Antonin Houska (#5)
Re: More efficient RI checks - take 2

I can imagine removal of the SPI from the current implementation (and
constructing the plans "manually"), but note that the queries I use in my
patch are no longer that trivial. So the SPI makes sense to me because it
ensures regular query planning.

As an intermediate step, in the case where we have one row, it should be
simple enough to extract that row manually, and do an SPI call with fixed
values rather than the join to the ephemeral table, yes?

As for the tuplestore, I'm not sure the startup cost is a problem: if
you're
concerned about the 1-row case, the row should usually be stored in memory.

and once that is done, we could see about step #2.

As I said during my review of your patch last year, I think the RI
semantics
has too much in common with that of triggers. I'd need more info to imagine
such a change.

As a general outline, I think that DML would iterate over the 2 sets of
potentially relevant RI definitions rather than iterating over the
triggers.

The similarities between RI and general triggers are obvious, which
explains why they went that route initially, but they're also a crutch, but
since all RI operations boil down to either an iteration over a tuplestore
to do lookups in an index (when checking for referenced rows), or a hash
join of the transient data against the un-indexed table when checking for
referencing rows, and people who know this stuff far better than me seem to
think that SPI overhead is best avoided when possible. I'm looking forward
to having more time to spend on this.

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Corey Huinker (#6)
Re: More efficient RI checks - take 2

On 2020-Apr-20, Corey Huinker wrote:

I can imagine removal of the SPI from the current implementation (and
constructing the plans "manually"), but note that the queries I use in my
patch are no longer that trivial. So the SPI makes sense to me because it
ensures regular query planning.

As an intermediate step, in the case where we have one row, it should be
simple enough to extract that row manually, and do an SPI call with fixed
values rather than the join to the ephemeral table, yes?

I do wonder if the RI stuff would actually end up being faster without
SPI. If not, we'd only end up writing more code to do the same thing.
Now that tables can be partitioned, it is much more of a pain than when
only regular tables could be supported. Obviously without SPI you
wouldn't *have* to go through the planner, which might be a win in
itself if the execution tree to use were always perfectly clear ... but
now that the queries get more complex per partitioning and this
optimization, is it?

You could remove the crosscheck_snapshot feature from SPI, I suppose,
but that's not that much code.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#7)
Re: More efficient RI checks - take 2

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I do wonder if the RI stuff would actually end up being faster without
SPI. If not, we'd only end up writing more code to do the same thing.
Now that tables can be partitioned, it is much more of a pain than when
only regular tables could be supported. Obviously without SPI you
wouldn't *have* to go through the planner, which might be a win in
itself if the execution tree to use were always perfectly clear ... but
now that the queries get more complex per partitioning and this
optimization, is it?

AFAIK, we do not have any code besides the planner that is capable of
building a plan tree at all, and I'd be pretty hesitant to try to create
such; those things are complicated.

It'd likely only make sense to bypass the planner if the required work
is predictable enough that you don't need a plan tree at all, but can
just hard-wire what has to be done. That seems a bit unlikely in the
presence of partitioning.

Instead of a plan tree, you could build a parse tree to pass through the
planner, rather than building a SQL statement string that has to be
parsed. The code jumps through enough hoops to make sure the string will
be parsed "just so" that this might net out to about an equal amount of
code in ri_triggers.c, and it'd save a nontrivial amount of parsing work.
But you'd have to abandon SPI, probably, or at least it's not clear how
much that'd be doing for you anymore.

regards, tom lane

#9Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#7)
Re: More efficient RI checks - take 2

Hi,

On 2020-04-21 11:34:54 -0400, Alvaro Herrera wrote:

On 2020-Apr-20, Corey Huinker wrote:

I can imagine removal of the SPI from the current implementation (and
constructing the plans "manually"), but note that the queries I use in my
patch are no longer that trivial. So the SPI makes sense to me because it
ensures regular query planning.

As an intermediate step, in the case where we have one row, it should be
simple enough to extract that row manually, and do an SPI call with fixed
values rather than the join to the ephemeral table, yes?

I do wonder if the RI stuff would actually end up being faster without
SPI.

I would suspect so. How much is another question.

I assume that with constructing plans "manually" you don't mean to
create a plan tree, but to invoke parser/planner directly? I think
that'd likely be better than going through SPI, and there's precedent
too.

But honestly, my gut feeling is that for a lot of cases it'd be best
just bypass parser, planner *and* executor. And just do manual
systable_beginscan() style checks. For most cases we exactly know what
plan shape we expect, and going through the overhead of creating a query
string, parsing, planning, caching the previous steps, and creating an
executor tree for every check is a lot. Even just the amount of memory
for caching the plans can be substantial.

Side note: I for one would appreciate a setting that just made all RI
actions requiring a seqscan error out...

If not, we'd only end up writing more code to do the same thing. Now
that tables can be partitioned, it is much more of a pain than when
only regular tables could be supported. Obviously without SPI you
wouldn't *have* to go through the planner, which might be a win in
itself if the execution tree to use were always perfectly clear
... but now that the queries get more complex per partitioning and
this optimization, is it?

I think it's actually a good case where we will commonly be able to do
*better* than generic planning. The infrastructure for efficient
partition pruning exists (for COPY etc) - but isn't easily applicable to
generic plans.

Greetings,

Andres Freund

#10Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#8)
Re: More efficient RI checks - take 2

Hi,

On 2020-04-21 16:14:53 -0400, Tom Lane wrote:

AFAIK, we do not have any code besides the planner that is capable of
building a plan tree at all, and I'd be pretty hesitant to try to create
such; those things are complicated.

I suspect what was meant was not to create the plan tree directly, but
to bypass SPI when creating the plan / executing the query.

IMO SPI for most uses in core PG really adds more complication and
overhead than warranted. The whole concept of having a global tuptable,
a stack and xact.c integration to repair that design defficiency... The
hiding of what happens behind a pretty random set of different
abstractions. That all makes it appear as if SPI did something super
complicated, but it really doesn't. It just is a bad and
over-complicated abstraction layer.

Greetings,

Andres Freund

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#9)
Re: More efficient RI checks - take 2

On 2020-Apr-22, Andres Freund wrote:

I assume that with constructing plans "manually" you don't mean to
create a plan tree, but to invoke parser/planner directly? I think
that'd likely be better than going through SPI, and there's precedent
too.

Well, I was actually thinking in building ready-made execution trees,
bypassing the planner altogether. But apparently no one thinks that
this is a good idea, and we don't have any code that does that already,
so maybe it's not a great idea.

However:

But honestly, my gut feeling is that for a lot of cases it'd be best
just bypass parser, planner *and* executor. And just do manual
systable_beginscan() style checks. For most cases we exactly know what
plan shape we expect, and going through the overhead of creating a query
string, parsing, planning, caching the previous steps, and creating an
executor tree for every check is a lot. Even just the amount of memory
for caching the plans can be substantial.

Avoiding the executor altogether scares me, but I can't say exactly why.
Foe example, you couldn't use foreign tables at either side of the FK --
but we don't allow FKs on those tables and we'd have to use some
specific executor node for such a thing anyway. So this not a real
argument against going that route.

Side note: I for one would appreciate a setting that just made all RI
actions requiring a seqscan error out...

Hmm, interesting thought. I guess there are actual cases where it's
not strictly necessary, for example where the referencing table is
really tiny -- not the *referenced* table, note, since you need the
UNIQUE index on that side in any case. I suppose that's not a really
interesting case. I don't think this is implementable when going
through SPI.

I think it's actually a good case where we will commonly be able to do
*better* than generic planning. The infrastructure for efficient
partition pruning exists (for COPY etc) - but isn't easily applicable to
generic plans.

True.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#11)
Re: More efficient RI checks - take 2

On Wed, Apr 22, 2020 at 1:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Well, I was actually thinking in building ready-made execution trees,
bypassing the planner altogether. But apparently no one thinks that
this is a good idea, and we don't have any code that does that already,
so maybe it's not a great idea.

If it's any consolation, I had the same idea very recently while
chatting with Amit Langote. Maybe it's a bad idea, but you're not the
only one who had it. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#11)
Re: More efficient RI checks - take 2

Hi,

On 2020-04-22 13:18:06 -0400, Alvaro Herrera wrote:

But honestly, my gut feeling is that for a lot of cases it'd be best
just bypass parser, planner *and* executor. And just do manual
systable_beginscan() style checks. For most cases we exactly know what
plan shape we expect, and going through the overhead of creating a query
string, parsing, planning, caching the previous steps, and creating an
executor tree for every check is a lot. Even just the amount of memory
for caching the plans can be substantial.

Avoiding the executor altogether scares me, but I can't say exactly why.
Foe example, you couldn't use foreign tables at either side of the FK --
but we don't allow FKs on those tables and we'd have to use some
specific executor node for such a thing anyway. So this not a real
argument against going that route.

I think it'd also not that hard to call a specific routine for doing
fkey checks on the remote side. Probably easier to handle things that
way than through "generic" FDW code.

Side note: I for one would appreciate a setting that just made all RI
actions requiring a seqscan error out...

Hmm, interesting thought. I guess there are actual cases where it's
not strictly necessary, for example where the referencing table is
really tiny -- not the *referenced* table, note, since you need the
UNIQUE index on that side in any case. I suppose that's not a really
interesting case.

Yea, the index is pretty much free there. Except I guess for the case of
a tiny table thats super heavily updated.

I don't think this is implementable when going through SPI.

It'd probably be not too hard to approximate by just erroring out when
there's no index on the relevant column, before even doing the planning.

Greetings,

Andres Freund

#14Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#12)
Re: More efficient RI checks - take 2

Hi,

On 2020-04-22 13:46:22 -0400, Robert Haas wrote:

On Wed, Apr 22, 2020 at 1:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Well, I was actually thinking in building ready-made execution trees,
bypassing the planner altogether. But apparently no one thinks that
this is a good idea, and we don't have any code that does that already,
so maybe it's not a great idea.

I was commenting on what I understood Corey to say, but was fairly
unclear about it. But I'm also far from sure that I understood Corey
correctly...

If it's any consolation, I had the same idea very recently while
chatting with Amit Langote. Maybe it's a bad idea, but you're not the
only one who had it. :-)

That seems extremely hard, given our current infrastructure. I think
there's actually a good case to be made for the idea in the abstract,
but ... The amount of logic the ExecInit* routines have is substantial,
the state they set up ss complicates. A lot of nodes have state that is
private to their .c files. All executor nodes reference the
corresponding Plan nodes, so you also need to mock up those.

Greetings,

Andres Freund

#15Andres Freund
andres@anarazel.de
In reply to: Corey Huinker (#3)
Re: More efficient RI checks - take 2

Hi,

On 2020-04-08 13:55:55 -0400, Corey Huinker wrote:

In doing my initial attempt, the feedback I was getting was that the people
who truly understood the RI checks fell into the following groups:
1. people who wanted to remove the SPI calls from the triggers
2. people who wanted to completely refactor RI to not use triggers
3. people who wanted to completely refactor triggers

FWIW, for me these three are largely independent avenues:

WRT 1: There's a lot of benefit in reducing the per-call overhead of
RI. Not going through SPI is one way to do that. Even if RI were not to
use triggers, we'd still want to reduce the per-statement costs.

WRT 2: Not using the generic per-row trigger framework for RI has significant
benefits too - checking multiple rows at once, deduplicating repeated
checks, reducing the per-row storage overhead ...

WRT 3: Fairly obviously improving the generic trigger code (more
efficient fetching of tuple versions, spilling etc) would have benefits
entirely independent of other RI improvements.

Greetings,

Andres Freund

#16Corey Huinker
corey.huinker@gmail.com
In reply to: Andres Freund (#14)
Re: More efficient RI checks - take 2

On Wed, Apr 22, 2020 at 2:36 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2020-04-22 13:46:22 -0400, Robert Haas wrote:

On Wed, Apr 22, 2020 at 1:18 PM Alvaro Herrera <alvherre@2ndquadrant.com>

wrote:

Well, I was actually thinking in building ready-made execution trees,
bypassing the planner altogether. But apparently no one thinks that
this is a good idea, and we don't have any code that does that already,
so maybe it's not a great idea.

I was commenting on what I understood Corey to say, but was fairly
unclear about it. But I'm also far from sure that I understood Corey
correctly...

I was unclear because, even after my failed foray into statement level
triggers for RI checks, I'm still pretty inexperienced in this area.

I'm just happy that it's being discussed.

#17Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#14)
Re: More efficient RI checks - take 2

On Wed, Apr 22, 2020 at 2:36 PM Andres Freund <andres@anarazel.de> wrote:

If it's any consolation, I had the same idea very recently while
chatting with Amit Langote. Maybe it's a bad idea, but you're not the
only one who had it. :-)

That seems extremely hard, given our current infrastructure. I think
there's actually a good case to be made for the idea in the abstract,
but ... The amount of logic the ExecInit* routines have is substantial,
the state they set up ss complicates. A lot of nodes have state that is
private to their .c files. All executor nodes reference the
corresponding Plan nodes, so you also need to mock up those.

Right -- the idea I was talking about was to create a Plan tree
without using the main planner. So it wouldn't bother costing an index
scan on each index, and a sequential scan, on the target table - it
would just make an index scan plan, or maybe an index path that it
would then convert to an index plan. Or something like that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#17)
Re: More efficient RI checks - take 2

Robert Haas <robertmhaas@gmail.com> writes:

Right -- the idea I was talking about was to create a Plan tree
without using the main planner. So it wouldn't bother costing an index
scan on each index, and a sequential scan, on the target table - it
would just make an index scan plan, or maybe an index path that it
would then convert to an index plan. Or something like that.

Consing up a Path tree and then letting create_plan() make it into
an executable plan might not be a terrible idea. There's a whole
boatload of finicky details that you could avoid that way, like
everything in setrefs.c.

But it's not entirely clear to me that we know the best plan for a
statement-level RI action with sufficient certainty to go that way.
Is it really the case that the plan would not vary based on how
many tuples there are to check, for example? If we *do* know
exactly what should happen, I'd tend to lean towards Andres'
idea that we shouldn't be using the executor at all, but just
hard-wiring stuff at the level of "do these table scans".

Also, it's definitely not the case that create_plan() has an API
that's so clean that you would be able to use it without major
hassles. You'd still have to generate a pile of lookalike planner
data structures, and make sure that expression subtrees have been
fed through eval_const_expressions(), etc etc.

On the whole I still think that generating a Query tree and then
letting the planner do its thing might be the best approach.

regards, tom lane

#19Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#18)
Re: More efficient RI checks - take 2

On Wed, Apr 22, 2020 at 6:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

But it's not entirely clear to me that we know the best plan for a
statement-level RI action with sufficient certainty to go that way.
Is it really the case that the plan would not vary based on how
many tuples there are to check, for example? If we *do* know
exactly what should happen, I'd tend to lean towards Andres'
idea that we shouldn't be using the executor at all, but just
hard-wiring stuff at the level of "do these table scans".

Well, I guess I'd naively think we want an index scan on a plain
table. It is barely possible that in some corner case a sequential
scan would be faster, but could it be enough faster to save the cost
of planning? I doubt it, but I just work here.

On a partitioning hierarchy we want to figure out which partition is
relevant for the value we're trying to find, and then scan that one.

I'm not sure there are any other cases. We have to have a UNIQUE
constraint or we can't be referencing this target table. So it can't
be a plain inheritance hierarchy, nor (I think) a foreign table.

Also, it's definitely not the case that create_plan() has an API
that's so clean that you would be able to use it without major
hassles. You'd still have to generate a pile of lookalike planner
data structures, and make sure that expression subtrees have been
fed through eval_const_expressions(), etc etc.

Yeah, that's annoying.

On the whole I still think that generating a Query tree and then
letting the planner do its thing might be the best approach.

Maybe, but it seems awfully heavy-weight. Once you go into the planner
it's pretty hard to avoid considering indexes we don't care about,
bitmap scans we don't care about, a sequential scan we don't care
about, etc. You'd certainly save something just from avoiding
parsing, but planning's pretty expensive too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Antonin Houska
ah@cybertec.at
In reply to: Tom Lane (#18)
Re: More efficient RI checks - take 2

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Right -- the idea I was talking about was to create a Plan tree
without using the main planner. So it wouldn't bother costing an index
scan on each index, and a sequential scan, on the target table - it
would just make an index scan plan, or maybe an index path that it
would then convert to an index plan. Or something like that.

Consing up a Path tree and then letting create_plan() make it into
an executable plan might not be a terrible idea. There's a whole
boatload of finicky details that you could avoid that way, like
everything in setrefs.c.

But it's not entirely clear to me that we know the best plan for a
statement-level RI action with sufficient certainty to go that way.
Is it really the case that the plan would not vary based on how
many tuples there are to check, for example?

I'm concerned about that too. With my patch the checks become a bit slower if
only a single row is processed. The problem seems to be that the planner is
not entirely convinced about that the number of input rows, so it can still
build a plan that expects many rows. For example (as I mentioned elsewhere in
the thread), a hash join where the hash table only contains one tuple. Or
similarly a sort node for a single input tuple.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#21Pavel Stehule
pavel.stehule@gmail.com
In reply to: Antonin Houska (#20)
Re: More efficient RI checks - take 2

čt 23. 4. 2020 v 7:06 odesílatel Antonin Houska <ah@cybertec.at> napsal:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Right -- the idea I was talking about was to create a Plan tree
without using the main planner. So it wouldn't bother costing an index
scan on each index, and a sequential scan, on the target table - it
would just make an index scan plan, or maybe an index path that it
would then convert to an index plan. Or something like that.

Consing up a Path tree and then letting create_plan() make it into
an executable plan might not be a terrible idea. There's a whole
boatload of finicky details that you could avoid that way, like
everything in setrefs.c.

But it's not entirely clear to me that we know the best plan for a
statement-level RI action with sufficient certainty to go that way.
Is it really the case that the plan would not vary based on how
many tuples there are to check, for example?

I'm concerned about that too. With my patch the checks become a bit slower
if
only a single row is processed. The problem seems to be that the planner is
not entirely convinced about that the number of input rows, so it can still
build a plan that expects many rows. For example (as I mentioned elsewhere
in
the thread), a hash join where the hash table only contains one tuple. Or
similarly a sort node for a single input tuple.

without statistics the planner expect about 2000 rows table , no?

Pavel

Show quoted text

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#22Antonin Houska
ah@cybertec.at
In reply to: Pavel Stehule (#21)
Re: More efficient RI checks - take 2

Pavel Stehule <pavel.stehule@gmail.com> wrote:

čt 23. 4. 2020 v 7:06 odesílatel Antonin Houska <ah@cybertec.at> napsal:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

But it's not entirely clear to me that we know the best plan for a
statement-level RI action with sufficient certainty to go that way.
Is it really the case that the plan would not vary based on how
many tuples there are to check, for example?

I'm concerned about that too. With my patch the checks become a bit slower if
only a single row is processed. The problem seems to be that the planner is
not entirely convinced about that the number of input rows, so it can still
build a plan that expects many rows. For example (as I mentioned elsewhere in
the thread), a hash join where the hash table only contains one tuple. Or
similarly a sort node for a single input tuple.

without statistics the planner expect about 2000 rows table , no?

I think that at some point it estimates the number of rows from the number of
table pages, but I don't remember details.

I wanted to say that if we constructed the plan "manually", we'd need at least
two substantially different variants: one to check many rows and the other to
check a single row.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#23Pavel Stehule
pavel.stehule@gmail.com
In reply to: Antonin Houska (#22)
Re: More efficient RI checks - take 2

čt 23. 4. 2020 v 8:28 odesílatel Antonin Houska <ah@cybertec.at> napsal:

Pavel Stehule <pavel.stehule@gmail.com> wrote:

čt 23. 4. 2020 v 7:06 odesílatel Antonin Houska <ah@cybertec.at> napsal:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

But it's not entirely clear to me that we know the best plan for a
statement-level RI action with sufficient certainty to go that way.
Is it really the case that the plan would not vary based on how
many tuples there are to check, for example?

I'm concerned about that too. With my patch the checks become a bit

slower if

only a single row is processed. The problem seems to be that the

planner is

not entirely convinced about that the number of input rows, so it can

still

build a plan that expects many rows. For example (as I mentioned

elsewhere in

the thread), a hash join where the hash table only contains one tuple.

Or

similarly a sort node for a single input tuple.

without statistics the planner expect about 2000 rows table , no?

I think that at some point it estimates the number of rows from the number
of
table pages, but I don't remember details.

I wanted to say that if we constructed the plan "manually", we'd need at
least
two substantially different variants: one to check many rows and the other
to
check a single row.

There can be more variants - a hash join should not be good enough for
bigger data.

The overhead of RI is too big, so I think any solution that will be faster
then current and can be inside Postgres 14 can be perfect.

But when you know so input is only one row, you can build a query without
join

Show quoted text

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

#24Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#19)
Re: More efficient RI checks - take 2

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Wed, Apr 22, 2020 at 6:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

But it's not entirely clear to me that we know the best plan for a
statement-level RI action with sufficient certainty to go that way.
Is it really the case that the plan would not vary based on how
many tuples there are to check, for example? If we *do* know
exactly what should happen, I'd tend to lean towards Andres'
idea that we shouldn't be using the executor at all, but just
hard-wiring stuff at the level of "do these table scans".

Well, I guess I'd naively think we want an index scan on a plain
table. It is barely possible that in some corner case a sequential
scan would be faster, but could it be enough faster to save the cost
of planning? I doubt it, but I just work here.

On a partitioning hierarchy we want to figure out which partition is
relevant for the value we're trying to find, and then scan that one.

I'm not sure there are any other cases. We have to have a UNIQUE
constraint or we can't be referencing this target table. So it can't
be a plain inheritance hierarchy, nor (I think) a foreign table.

In the cases where we have a UNIQUE constraint, and therefore a clear
index to use, I tend to agree that we should just be getting to it and
avoiding the planner/executor, as Andres suggest.

I'm not super thrilled about the idea of throwing an ERROR when we
haven't got an index to use though, and we don't require an index on the
referring side, meaning that, with such a change, a DELETE or UPDATE on
the referred table with an ON CASCADE FK will just start throwing
errors. That's not terribly friendly, even if it's not really best
practice to not have an index to help with those cases.

I'd hope that we would at least teach pg_upgrade to look for such cases
and throw errors (though maybe that could be downgraded to a WARNING
with a flag..?) if it finds any when upgrading, so that users don't
upgrade and then suddenly start getting errors for simple statements
that used to work just fine.

On the whole I still think that generating a Query tree and then
letting the planner do its thing might be the best approach.

Maybe, but it seems awfully heavy-weight. Once you go into the planner
it's pretty hard to avoid considering indexes we don't care about,
bitmap scans we don't care about, a sequential scan we don't care
about, etc. You'd certainly save something just from avoiding
parsing, but planning's pretty expensive too.

Agreed.

Thanks,

Stephen

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#19)
Re: More efficient RI checks - take 2

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Apr 22, 2020 at 6:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

But it's not entirely clear to me that we know the best plan for a
statement-level RI action with sufficient certainty to go that way.

Well, I guess I'd naively think we want an index scan on a plain
table. It is barely possible that in some corner case a sequential
scan would be faster, but could it be enough faster to save the cost
of planning? I doubt it, but I just work here.

I think we're failing to communicate here. I agree that if the goal
is simply to re-implement what the RI triggers currently do --- that
is, retail one-row-at-a-time checks --- then we could probably dispense
with all the parser/planner/executor overhead and directly implement
an indexscan using an API at about the level genam.c provides.
(The issue of whether it's okay to require an index to be available is
annoying, but we could always fall back to the old ways if one is not.)

However, what I thought this thread was about was switching to
statement-level RI checking. At that point, what we're talking
about is performing a join involving a not-known-in-advance number
of tuples on each side. If you think you can hard-wire the choice
of join technology and have it work well all the time, I'm going to
say with complete confidence that you are wrong. The planner spends
huge amounts of effort on that and still doesn't always get it right
... but it does better than a hard-wired choice would do.

Maybe there's room to pursue both things --- you could imagine,
perhaps, looking at the planner's estimate of number of affected
rows at executor startup and deciding from that whether to fire
per-row or per-statement RI triggers. But we're really going to
want different implementations within those two types of triggers.

regards, tom lane

#26Amit Langote
amitlangote09@gmail.com
In reply to: Alvaro Herrera (#11)
Re: More efficient RI checks - take 2

On Thu, Apr 23, 2020 at 2:18 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Apr-22, Andres Freund wrote:

I assume that with constructing plans "manually" you don't mean to
create a plan tree, but to invoke parser/planner directly? I think
that'd likely be better than going through SPI, and there's precedent
too.

Well, I was actually thinking in building ready-made execution trees,
bypassing the planner altogether. But apparently no one thinks that
this is a good idea, and we don't have any code that does that already,
so maybe it's not a great idea.

We do have an instance in validateForeignKeyConstraint() of "manually"
enforcing RI:

If RI_Initial_Check() (a relatively complex query) cannot be
performed, the referencing table is scanned manually and each tuple
thus found is looked up in the referenced table by using
RI_FKey_check_ins(), a simpler query.

Ironically though, RI_Initial_Check() is to short-circuit the manual algorithm.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#27Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#25)
Re: More efficient RI checks - take 2

On Thu, Apr 23, 2020 at 10:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think we're failing to communicate here. I agree that if the goal
is simply to re-implement what the RI triggers currently do --- that
is, retail one-row-at-a-time checks --- then we could probably dispense
with all the parser/planner/executor overhead and directly implement
an indexscan using an API at about the level genam.c provides.
(The issue of whether it's okay to require an index to be available is
annoying, but we could always fall back to the old ways if one is not.)

However, what I thought this thread was about was switching to
statement-level RI checking. At that point, what we're talking
about is performing a join involving a not-known-in-advance number
of tuples on each side. If you think you can hard-wire the choice
of join technology and have it work well all the time, I'm going to
say with complete confidence that you are wrong. The planner spends
huge amounts of effort on that and still doesn't always get it right
... but it does better than a hard-wired choice would do.

Oh, yeah. If we're talking about that, then getting by without using
the planner doesn't seem feasible. Sorry, I guess I didn't read the
thread carefully enough.

As you say, perhaps there's room for both things, but also as you say,
it's not obvious how to decide intelligently between them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#28Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#27)
Re: More efficient RI checks - take 2

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Thu, Apr 23, 2020 at 10:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think we're failing to communicate here. I agree that if the goal
is simply to re-implement what the RI triggers currently do --- that
is, retail one-row-at-a-time checks --- then we could probably dispense
with all the parser/planner/executor overhead and directly implement
an indexscan using an API at about the level genam.c provides.
(The issue of whether it's okay to require an index to be available is
annoying, but we could always fall back to the old ways if one is not.)

However, what I thought this thread was about was switching to
statement-level RI checking. At that point, what we're talking
about is performing a join involving a not-known-in-advance number
of tuples on each side. If you think you can hard-wire the choice
of join technology and have it work well all the time, I'm going to
say with complete confidence that you are wrong. The planner spends
huge amounts of effort on that and still doesn't always get it right
... but it does better than a hard-wired choice would do.

Oh, yeah. If we're talking about that, then getting by without using
the planner doesn't seem feasible. Sorry, I guess I didn't read the
thread carefully enough.

Yeah, I had been thinking about what we might do with the existing
row-level RI checks too. If we're able to get statement-level without
much impact on the single-row-statement case then that's certainly
interesting, although it sure feels like we're ending up with a lot left
on the table.

As you say, perhaps there's room for both things, but also as you say,
it's not obvious how to decide intelligently between them.

The single-row case seems pretty clear and also seems common enough that
it'd be worth paying the cost to figure out if it's a single-row
statement or not.

Perhaps we start with row-level for the first row, implemented directly
using an index lookup, and when we hit some threshold (maybe even just
"more than one") switch to using the transient table and queue'ing
the rest to check at the end.

What bothers me the most about this approach (though, to be clear, I
think we should still pursue it) is the risk that we might end up
picking a spectacularly bad plan that ends up taking a great deal more
time than the index-probe based approach we almost always have today.
If we limit that impact to only cases where >1 row is involved, then
that's certainly better (though maybe we'll need a GUC for this
anyway..? If we had the single-row approach + the statement-level one,
presumably the GUC would just make us always take the single-row method,
so it hopefully wouldn't be too grotty to have).

Thanks,

Stephen

#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#28)
Re: More efficient RI checks - take 2

Stephen Frost <sfrost@snowman.net> writes:

* Robert Haas (robertmhaas@gmail.com) wrote:

As you say, perhaps there's room for both things, but also as you say,
it's not obvious how to decide intelligently between them.

The single-row case seems pretty clear and also seems common enough that
it'd be worth paying the cost to figure out if it's a single-row
statement or not.

That seems hard to do in advance ... but it would be easy to code
a statement-level AFTER trigger along the lines of

if (transition table contains one row)
// fast special case here
else
// slow general case here.

I think the question really comes down to this: is the per-row overhead of
the transition-table mechanism comparable to that of the AFTER trigger
queue? Or if not, can we make it so?

regards, tom lane

#30Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#29)
Re: More efficient RI checks - take 2

Hi,

On 2020-04-28 10:44:58 -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Robert Haas (robertmhaas@gmail.com) wrote:

As you say, perhaps there's room for both things, but also as you say,
it's not obvious how to decide intelligently between them.

The single-row case seems pretty clear and also seems common enough that
it'd be worth paying the cost to figure out if it's a single-row
statement or not.

It's not that obvious to me that it's going to be beneficial to go down
a planned path in all that many cases. If all that the RI check does is
a index_rescan() && index_getnext_slot(), there's not that many
realistic types of plans that are going to be better.

IIUC a query to check a transition table would, simplified, boil down to
either:

SELECT * FROM transition_table tt
WHERE
-- for a insertion/update into the referencing table and
(
NOT EXISTS (SELECT * FROM referenced_table rt WHERE rt.referenced_column = tt.referencing_column)
[AND ... , ]
)
-- for update / delete of referenced table
OR EXISTS (SELECT * FROM referencing_table rt1 WHERE rt1.referencing_column = tt.referenced_column1)
[OR ... , ]
LIMIT 1;

Where a returned row would signal an error. But it would need to handle
row locking, CASCADE/SET NULL/SET DEFAULT etc. More on that below.

While it's tempting to want to write the latter check as

-- for update / delete of referenced table
SELECT * FROM referencing_table rt
WHERE referencing_column IN (SELECT referenced_column FROM transition_table tt)
LIMIT 1;
that'd make it harder to know the violating row.

As the transition table isn't ordered it seems like there's not that
many realistic ways to execute this:

1) A nestloop semi/anti-join with an inner index scan
2) Sort transition table, do a merge semi/anti-join between sort and an
ordered index scan on the referenced / referencing table(s).
3) Hash semi/anti-join, requiring a full table scan of the tables

I think 1) would be worse than just doing the indexscan manually. 2)
would probably be beneficial if there's a lot of rows on the inner side,
due to the ordered access and deduplication. 3) would sometimes be
beneficial because it'd avoid an index scan for each tuple in the
transition table.

The cases in which it is clear to me that a bulk check could
theoretically be significantly better than a fast per-row check are:

1) There's a *lot* of rows in the transition table to comparatively small
referenced / referencing tables. As those tables can cheaply be
hashed, a hashjoin will be be more efficient than doing a index lookup
for each transition table row.
2) There's a lot of duplicate content in the transition
table. E.g. because there's a lot of references to the same row.

Did I forget important ones?

With regard to the row locking etc that I elided above: I think that
actually will prevent most if not all interesting optimizations: Because
of the FOR KEY SHARE that's required, the planner plan will pretty much
always degrade to a per row subplan check anyway. Is there any
formulation of the necessary queries that don't suffer from this
problem?

That seems hard to do in advance ... but it would be easy to code
a statement-level AFTER trigger along the lines of

if (transition table contains one row)
// fast special case here
else
// slow general case here.

I suspect we'd need something more complicated than this for it to be
beneficial. My gut feeling would be that the case where a transition
table style check would be most commonly beneficial is when you have a
very small referenced table, and a *lot* of rows get inserted. But
clearly we wouldn't want to have bulk insertion suddenly also store all
rows in a transition table.

Nor would we want to have a bulk UPDATE cause all the updated rows to be
stored in the transition table, even though none of the relevant columns
changed (i.e. the RI_FKey_[f]pk_upd_check_required logic in
AfterTriggerSaveEvent()).

I still don't quite see how shunting RI checks through triggers saves us
more than it costs:

Especially for the stuff we do as AFTER: Most of the time we could do
the work we defer till query end immediately, rather than building up an
in-memory queue. Besides saving memory, in a lot of cases that'd also
make it unnecessary to refetch the row at a later time, possibly needing
to chase updated row versions.

But even for the BEFORE checks, largely going through generic trigger
code means it's much harder to batch checks without suddenly requiring
memory proportional to the number of inserted rows.

There obviously are cases where it's not possible to check just after
each row. Deferrable constraints, as well as CASCADING / SET NOT NULL /
SET DEFAULT on tables with user defined triggers, for example. But it'd
likely be sensible to handle that in the way we already handle deferred
uniqueness checks, i.e. we only queue something if there's a potential
for a problem.

I think the question really comes down to this: is the per-row overhead of
the transition-table mechanism comparable to that of the AFTER trigger
queue? Or if not, can we make it so?

It's probably more expensive, in some ways, at the moment. The biggest
difference is that the transition table stores complete rows, valid as
of the moment they've been inserted/updated/deleted, whereas the trigger
queue only stores enough information to fetch the tuple again during
trigger execution. Several RI checks however re-check visiblity before
executing, so that's another fetch, that'd likely not be elided by a
simple change to using transition tables.

Both have significant downsides, obviously. Storing complete rows can
take a lot more memory, and refetching rows is expensive, especially if
it happens much later (with the row pushed out of shared_buffers
potentially).

I think it was a mistake to have these two different systems in
trigger.c. When transition tables were added we shouldn't have kept
per-tuple state both in the queue and in the transition
tuplestore. Instead we should have only used the tuplestore, and
optimized what information we store inside depending on the need of the
various after triggers.

Greetings,

Andres Freund

#31Antonin Houska
ah@cybertec.at
In reply to: Antonin Houska (#1)
5 attachment(s)
Re: More efficient RI checks - take 2

Antonin Houska <ah@cybertec.at> wrote:

In general, the checks are significantly faster if there are many rows to
process, and a bit slower when we only need to check a single row.

Attached is a new version that uses the existing simple queries if there's
only one row to check. SPI is used for both single-row and bulk checks - as
discussed in this thread, it can perhaps be replaced with a different approach
if appears to be beneficial, at least for the single-row checks.

I think using a separate query for the single-row check is more practicable
than convincing the planner that the bulk-check query should only check a
single row. So this patch version tries to show what it'd look like.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachments:

v02-0001-Check-for-RI-violation-outside-ri_PerformCheck.patchtext/x-diffDownload
From 6c1cb8ae7fbf0a8122d8c6637c61b9915bc57223 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Fri, 5 Jun 2020 16:42:34 +0200
Subject: [PATCH 1/5] Check for RI violation outside ri_PerformCheck().

---
 src/backend/utils/adt/ri_triggers.c | 40 ++++++++++++++---------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index bb49e80d16..6220872126 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -389,11 +389,16 @@ RI_FKey_check(TriggerData *trigdata)
 	/*
 	 * Now check that foreign key exists in PK table
 	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					NULL, newslot,
-					false,
-					SPI_OK_SELECT);
+	if (!ri_PerformCheck(riinfo, &qkey, qplan,
+						 fk_rel, pk_rel,
+						 NULL, newslot,
+						 false,
+						 SPI_OK_SELECT))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   newslot,
+						   NULL,
+						   qkey.constr_queryno, false);
 
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
@@ -708,11 +713,16 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	/*
 	 * We have a plan now. Run it to check for existing references.
 	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					oldslot, NULL,
-					true,		/* must detect new rows */
-					SPI_OK_SELECT);
+	if (ri_PerformCheck(riinfo, &qkey, qplan,
+						fk_rel, pk_rel,
+						oldslot, NULL,
+						true,	/* must detect new rows */
+						SPI_OK_SELECT))
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   oldslot,
+						   NULL,
+						   qkey.constr_queryno, false);
 
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
@@ -2288,16 +2298,6 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						RelationGetRelationName(fk_rel)),
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
-	/* XXX wouldn't it be clearer to do this part at the caller? */
-	if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
-		expect_OK == SPI_OK_SELECT &&
-		(SPI_processed == 0) == (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK))
-		ri_ReportViolation(riinfo,
-						   pk_rel, fk_rel,
-						   newslot ? newslot : oldslot,
-						   NULL,
-						   qkey->constr_queryno, false);
-
 	return SPI_processed != 0;
 }
 
-- 
2.20.1

v02-0002-Changed-ri_GenerateQual-so-it-generates-the-whole-qu.patchtext/x-diffDownload
From 6b09e5598553c8e57b4ef9342912f51adb48f8af Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Fri, 5 Jun 2020 16:42:34 +0200
Subject: [PATCH 2/5] Changed ri_GenerateQual() so it generates the whole
 qualifier.

This way we can use the function to reduce the amount of copy&pasted code a
bit.
---
 src/backend/utils/adt/ri_triggers.c | 435 ++++++++++++++--------------
 1 file changed, 216 insertions(+), 219 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 6220872126..f08a83067b 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -180,11 +180,31 @@ static Datum ri_restrict(TriggerData *trigdata, bool is_no_action);
 static Datum ri_set(TriggerData *trigdata, bool is_set_null);
 static void quoteOneName(char *buffer, const char *name);
 static void quoteRelationName(char *buffer, Relation rel);
-static void ri_GenerateQual(StringInfo buf,
-							const char *sep,
-							const char *leftop, Oid leftoptype,
-							Oid opoid,
-							const char *rightop, Oid rightoptype);
+static char *ri_ColNameQuoted(const char *tabname, const char *attname);
+
+/*
+ * Use one of these values to tell ri_GenerateQual() where the parameter
+ * markers ($1, $2, ...) should appear in the qualifier.
+ */
+typedef enum GenQualParams
+{
+	GQ_PARAMS_NONE,				/* No parameters, only attribute names. */
+	GQ_PARAMS_LEFT,				/* The left side of the qual contains
+								 * parameters. */
+	GQ_PARAMS_RIGHT,			/* The right side of the qual contains
+								 * parameters. */
+} GenQualParams;
+static void ri_GenerateQual(StringInfo buf, char *sep, int nkeys,
+							const char *ltabname, Relation lrel,
+							const int16 *lattnums,
+							const char *rtabname, Relation rrel,
+							const int16 *rattnums, const Oid *eq_oprs,
+							GenQualParams params, Oid *paramtypes);
+static void ri_GenerateQualComponent(StringInfo buf,
+									 const char *sep,
+									 const char *leftop, Oid leftoptype,
+									 Oid opoid,
+									 const char *rightop, Oid rightoptype);
 static void ri_GenerateQualCollation(StringInfo buf, Oid collation);
 static int	ri_NullCheck(TupleDesc tupdesc, TupleTableSlot *slot,
 						 const RI_ConstraintInfo *riinfo, bool rel_is_pk);
@@ -343,9 +363,6 @@ RI_FKey_check(TriggerData *trigdata)
 	{
 		StringInfoData querybuf;
 		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
 		Oid			queryoids[RI_MAX_NUMKEYS];
 		const char *pk_only;
 
@@ -361,25 +378,14 @@ RI_FKey_check(TriggerData *trigdata)
 		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 			"" : "ONLY ";
 		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
+		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s p WHERE ",
 						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							paramname, fk_type);
-			querysep = "AND";
-			queryoids[i] = fk_type;
-		}
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
+		ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
+						NULL, pk_rel, riinfo->pk_attnums,
+						NULL, fk_rel, riinfo->fk_attnums,
+						riinfo->pf_eq_oprs,
+						GQ_PARAMS_RIGHT, queryoids);
+		appendStringInfoString(&querybuf, " FOR KEY SHARE OF p");
 
 		/* Prepare and save the plan */
 		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
@@ -476,9 +482,6 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 	{
 		StringInfoData querybuf;
 		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
 		const char *pk_only;
 		Oid			queryoids[RI_MAX_NUMKEYS];
 
@@ -494,23 +497,15 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 			"" : "ONLY ";
 		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
+		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x WHERE ",
 						 pk_only, pkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
 
-			quoteOneName(attname,
-						 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							attname, pk_type,
-							riinfo->pp_eq_oprs[i],
-							paramname, pk_type);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
+		ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
+						NULL, pk_rel, riinfo->pk_attnums,
+						NULL, fk_rel, riinfo->fk_attnums,
+						riinfo->pf_eq_oprs,
+						GQ_PARAMS_RIGHT,
+						queryoids);
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
 		/* Prepare and save the plan */
@@ -663,9 +658,6 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	{
 		StringInfoData querybuf;
 		char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
 		Oid			queryoids[RI_MAX_NUMKEYS];
 		const char *fk_only;
 
@@ -681,28 +673,16 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 		fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 			"" : "ONLY ";
 		quoteRelationName(fkrelname, fk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
+		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x WHERE ",
 						 fk_only, fkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-			Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
 
-			quoteOneName(attname,
-						 RIAttName(fk_rel, riinfo->fk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							paramname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							attname, fk_type);
-			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
-				ri_GenerateQualCollation(&querybuf, pk_coll);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
+		ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
+						NULL, pk_rel, riinfo->pk_attnums,
+						NULL, fk_rel, riinfo->fk_attnums,
+						riinfo->pf_eq_oprs,
+						GQ_PARAMS_LEFT,
+						queryoids);
+
 		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
 
 		/* Prepare and save the plan */
@@ -775,9 +755,6 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	{
 		StringInfoData querybuf;
 		char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
 		Oid			queryoids[RI_MAX_NUMKEYS];
 		const char *fk_only;
 
@@ -792,28 +769,15 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 		fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 			"" : "ONLY ";
 		quoteRelationName(fkrelname, fk_rel);
-		appendStringInfo(&querybuf, "DELETE FROM %s%s",
-						 fk_only, fkrelname);
-		querysep = "WHERE";
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-			Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
 
-			quoteOneName(attname,
-						 RIAttName(fk_rel, riinfo->fk_attnums[i]));
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&querybuf, querysep,
-							paramname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							attname, fk_type);
-			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
-				ri_GenerateQualCollation(&querybuf, pk_coll);
-			querysep = "AND";
-			queryoids[i] = pk_type;
-		}
+		appendStringInfo(&querybuf, "DELETE FROM %s%s WHERE ", fk_only,
+						 fkrelname);
+		ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
+						NULL, pk_rel, riinfo->pk_attnums,
+						NULL, fk_rel, riinfo->fk_attnums,
+						riinfo->pf_eq_oprs,
+						GQ_PARAMS_LEFT,
+						queryoids);
 
 		/* Prepare and save the plan */
 		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
@@ -924,10 +888,10 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 							 "%s %s = $%d",
 							 querysep, attname, i + 1);
 			sprintf(paramname, "$%d", j + 1);
-			ri_GenerateQual(&qualbuf, qualsep,
-							paramname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							attname, fk_type);
+			ri_GenerateQualComponent(&qualbuf, qualsep,
+									 paramname, pk_type,
+									 riinfo->pf_eq_oprs[i],
+									 attname, fk_type);
 			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
 				ri_GenerateQualCollation(&querybuf, pk_coll);
 			querysep = ",";
@@ -1064,12 +1028,7 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
 		StringInfoData querybuf;
-		StringInfoData qualbuf;
 		char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *qualsep;
 		Oid			queryoids[RI_MAX_NUMKEYS];
 		const char *fk_only;
 
@@ -1082,39 +1041,32 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 		 * ----------
 		 */
 		initStringInfo(&querybuf);
-		initStringInfo(&qualbuf);
 		fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 			"" : "ONLY ";
 		quoteRelationName(fkrelname, fk_rel);
 		appendStringInfo(&querybuf, "UPDATE %s%s SET",
 						 fk_only, fkrelname);
-		querysep = "";
-		qualsep = "WHERE";
+
 		for (int i = 0; i < riinfo->nkeys; i++)
 		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-			Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
+			char		attname[MAX_QUOTED_NAME_LEN];
+			const char *sep = i > 0 ? "," : "";
 
 			quoteOneName(attname,
 						 RIAttName(fk_rel, riinfo->fk_attnums[i]));
+
 			appendStringInfo(&querybuf,
 							 "%s %s = %s",
-							 querysep, attname,
+							 sep, attname,
 							 is_set_null ? "NULL" : "DEFAULT");
-			sprintf(paramname, "$%d", i + 1);
-			ri_GenerateQual(&qualbuf, qualsep,
-							paramname, pk_type,
-							riinfo->pf_eq_oprs[i],
-							attname, fk_type);
-			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
-				ri_GenerateQualCollation(&querybuf, pk_coll);
-			querysep = ",";
-			qualsep = "AND";
-			queryoids[i] = pk_type;
 		}
-		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
+
+		appendStringInfo(&querybuf, " WHERE ");
+		ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
+						NULL, pk_rel, riinfo->pk_attnums,
+						NULL, fk_rel, riinfo->fk_attnums,
+						riinfo->pf_eq_oprs,
+						GQ_PARAMS_LEFT, queryoids);
 
 		/* Prepare and save the plan */
 		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
@@ -1402,31 +1354,14 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 		"" : "ONLY ";
 	appendStringInfo(&querybuf,
-					 " FROM %s%s fk LEFT OUTER JOIN %s%s pk ON",
+					 " FROM %s%s fk LEFT OUTER JOIN %s%s pk ON (",
 					 fk_only, fkrelname, pk_only, pkrelname);
 
-	strcpy(pkattname, "pk.");
-	strcpy(fkattname, "fk.");
-	sep = "(";
-	for (int i = 0; i < riinfo->nkeys; i++)
-	{
-		Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-		Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-		Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
-		Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
-
-		quoteOneName(pkattname + 3,
-					 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-		quoteOneName(fkattname + 3,
-					 RIAttName(fk_rel, riinfo->fk_attnums[i]));
-		ri_GenerateQual(&querybuf, sep,
-						pkattname, pk_type,
-						riinfo->pf_eq_oprs[i],
-						fkattname, fk_type);
-		if (pk_coll != fk_coll)
-			ri_GenerateQualCollation(&querybuf, pk_coll);
-		sep = "AND";
-	}
+	ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
+					"pk", pk_rel, riinfo->pk_attnums,
+					"fk", fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs,
+					GQ_PARAMS_NONE, NULL);
 
 	/*
 	 * It's sufficient to test any one pk attribute for null to detect a join
@@ -1584,7 +1519,6 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	char	   *constraintDef;
 	char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
 	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-	char		pkattname[MAX_QUOTED_NAME_LEN + 3];
 	char		fkattname[MAX_QUOTED_NAME_LEN + 3];
 	const char *sep;
 	const char *fk_only;
@@ -1633,30 +1567,14 @@ RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 		"" : "ONLY ";
 	appendStringInfo(&querybuf,
-					 " FROM %s%s fk JOIN %s pk ON",
+					 " FROM %s%s fk JOIN %s pk ON (",
 					 fk_only, fkrelname, pkrelname);
-	strcpy(pkattname, "pk.");
-	strcpy(fkattname, "fk.");
-	sep = "(";
-	for (i = 0; i < riinfo->nkeys; i++)
-	{
-		Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-		Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-		Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
-		Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
-
-		quoteOneName(pkattname + 3,
-					 RIAttName(pk_rel, riinfo->pk_attnums[i]));
-		quoteOneName(fkattname + 3,
-					 RIAttName(fk_rel, riinfo->fk_attnums[i]));
-		ri_GenerateQual(&querybuf, sep,
-						pkattname, pk_type,
-						riinfo->pf_eq_oprs[i],
-						fkattname, fk_type);
-		if (pk_coll != fk_coll)
-			ri_GenerateQualCollation(&querybuf, pk_coll);
-		sep = "AND";
-	}
+
+	ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
+					"pk", pk_rel, riinfo->pk_attnums,
+					"fk", fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs,
+					GQ_PARAMS_NONE, NULL);
 
 	/*
 	 * Start the WHERE clause with the partition constraint (except if this is
@@ -1820,7 +1738,63 @@ quoteRelationName(char *buffer, Relation rel)
 }
 
 /*
- * ri_GenerateQual --- generate a WHERE clause equating two variables
+ * ri_GenerateQual --- generate WHERE/ON clause.
+ *
+ * Note: to avoid unnecessary explicit casts, make sure that the left and
+ * right operands match eq_oprs expect (ie don't swap the left and right
+ * operands accidentally).
+ */
+static void
+ri_GenerateQual(StringInfo buf, char *sep, int nkeys,
+				const char *ltabname, Relation lrel,
+				const int16 *lattnums,
+				const char *rtabname, Relation rrel,
+				const int16 *rattnums,
+				const Oid *eq_oprs,
+				GenQualParams params,
+				Oid *paramtypes)
+{
+	for (int i = 0; i < nkeys; i++)
+	{
+		Oid			ltype = RIAttType(lrel, lattnums[i]);
+		Oid			rtype = RIAttType(rrel, rattnums[i]);
+		Oid			lcoll = RIAttCollation(lrel, lattnums[i]);
+		Oid			rcoll = RIAttCollation(rrel, rattnums[i]);
+		char		paramname[16];
+		char	   *latt,
+				   *ratt;
+		char	   *sep_current = i > 0 ? sep : NULL;
+
+		if (params != GQ_PARAMS_NONE)
+			sprintf(paramname, "$%d", i + 1);
+
+		if (params == GQ_PARAMS_LEFT)
+		{
+			latt = paramname;
+			paramtypes[i] = ltype;
+		}
+		else
+			latt = ri_ColNameQuoted(ltabname, RIAttName(lrel, lattnums[i]));
+
+		if (params == GQ_PARAMS_RIGHT)
+		{
+			ratt = paramname;
+			paramtypes[i] = rtype;
+		}
+		else
+			ratt = ri_ColNameQuoted(rtabname, RIAttName(rrel, rattnums[i]));
+
+		ri_GenerateQualComponent(buf, sep_current, latt, ltype, eq_oprs[i],
+								 ratt, rtype);
+
+		if (lcoll != rcoll)
+			ri_GenerateQualCollation(buf, lcoll);
+	}
+}
+
+/*
+ * ri_GenerateQual --- generate a component of WHERE/ON clause equating two
+ * variables, to be AND-ed to the other components.
  *
  * This basically appends " sep leftop op rightop" to buf, adding casts
  * and schema qualification as needed to ensure that the parser will select
@@ -1828,17 +1802,86 @@ quoteRelationName(char *buffer, Relation rel)
  * if they aren't variables or parameters.
  */
 static void
-ri_GenerateQual(StringInfo buf,
-				const char *sep,
-				const char *leftop, Oid leftoptype,
-				Oid opoid,
-				const char *rightop, Oid rightoptype)
+ri_GenerateQualComponent(StringInfo buf,
+						 const char *sep,
+						 const char *leftop, Oid leftoptype,
+						 Oid opoid,
+						 const char *rightop, Oid rightoptype)
 {
-	appendStringInfo(buf, " %s ", sep);
+	if (sep)
+		appendStringInfo(buf, " %s ", sep);
 	generate_operator_clause(buf, leftop, leftoptype, opoid,
 							 rightop, rightoptype);
 }
 
+/*
+ * ri_ColNameQuoted() --- return column name, with both table and column name
+ * quoted.
+ */
+static char *
+ri_ColNameQuoted(const char *tabname, const char *attname)
+{
+	char		quoted[MAX_QUOTED_NAME_LEN];
+	StringInfo	result = makeStringInfo();
+
+	if (tabname && strlen(tabname) > 0)
+	{
+		quoteOneName(quoted, tabname);
+		appendStringInfo(result, "%s.", quoted);
+	}
+
+	quoteOneName(quoted, attname);
+	appendStringInfoString(result, quoted);
+
+	return result->data;
+}
+
+/*
+ * Check that RI trigger function was called in expected context
+ */
+static void
+ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname, int tgkind)
+{
+	TriggerData *trigdata = (TriggerData *) fcinfo->context;
+
+	if (!CALLED_AS_TRIGGER(fcinfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+				 errmsg("function \"%s\" was not called by trigger manager", funcname)));
+
+	/*
+	 * Check proper event
+	 */
+	if (!TRIGGER_FIRED_AFTER(trigdata->tg_event) ||
+		!TRIGGER_FIRED_FOR_ROW(trigdata->tg_event))
+		ereport(ERROR,
+				(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+				 errmsg("function \"%s\" must be fired AFTER ROW", funcname)));
+
+	switch (tgkind)
+	{
+		case RI_TRIGTYPE_INSERT:
+			if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event))
+				ereport(ERROR,
+						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+						 errmsg("function \"%s\" must be fired for INSERT", funcname)));
+			break;
+		case RI_TRIGTYPE_UPDATE:
+			if (!TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
+				ereport(ERROR,
+						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+						 errmsg("function \"%s\" must be fired for UPDATE", funcname)));
+			break;
+
+		case RI_TRIGTYPE_DELETE:
+			if (!TRIGGER_FIRED_BY_DELETE(trigdata->tg_event))
+				ereport(ERROR,
+						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+						 errmsg("function \"%s\" must be fired for DELETE", funcname)));
+			break;
+	}
+}
+
 /*
  * ri_GenerateQualCollation --- add a COLLATE spec to a WHERE clause
  *
@@ -1909,52 +1952,6 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	key->constr_queryno = constr_queryno;
 }
 
-/*
- * Check that RI trigger function was called in expected context
- */
-static void
-ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname, int tgkind)
-{
-	TriggerData *trigdata = (TriggerData *) fcinfo->context;
-
-	if (!CALLED_AS_TRIGGER(fcinfo))
-		ereport(ERROR,
-				(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
-				 errmsg("function \"%s\" was not called by trigger manager", funcname)));
-
-	/*
-	 * Check proper event
-	 */
-	if (!TRIGGER_FIRED_AFTER(trigdata->tg_event) ||
-		!TRIGGER_FIRED_FOR_ROW(trigdata->tg_event))
-		ereport(ERROR,
-				(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
-				 errmsg("function \"%s\" must be fired AFTER ROW", funcname)));
-
-	switch (tgkind)
-	{
-		case RI_TRIGTYPE_INSERT:
-			if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event))
-				ereport(ERROR,
-						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
-						 errmsg("function \"%s\" must be fired for INSERT", funcname)));
-			break;
-		case RI_TRIGTYPE_UPDATE:
-			if (!TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
-				ereport(ERROR,
-						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
-						 errmsg("function \"%s\" must be fired for UPDATE", funcname)));
-			break;
-		case RI_TRIGTYPE_DELETE:
-			if (!TRIGGER_FIRED_BY_DELETE(trigdata->tg_event))
-				ereport(ERROR,
-						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
-						 errmsg("function \"%s\" must be fired for DELETE", funcname)));
-			break;
-	}
-}
-
-
 /*
  * Fetch the RI_ConstraintInfo struct for the trigger's FK constraint.
  */
-- 
2.20.1

v02-0003-Return-early-from-ri_NullCheck-if-possible.patchtext/x-diffDownload
From bf25d57d5b1ab6d295221047af15246cd8dfcf5a Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Fri, 5 Jun 2020 16:42:34 +0200
Subject: [PATCH 3/5] Return early from ri_NullCheck() if possible.

---
 src/backend/utils/adt/ri_triggers.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index f08a83067b..647b102be1 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2508,6 +2508,13 @@ ri_NullCheck(TupleDesc tupDesc,
 			nonenull = false;
 		else
 			allnull = false;
+
+		/*
+		 * If seen both NULL and non-NULL, the next attributes cannot change
+		 * the result.
+		 */
+		if (!nonenull && !allnull)
+			return RI_KEYS_SOME_NULL;
 	}
 
 	if (allnull)
@@ -2516,7 +2523,8 @@ ri_NullCheck(TupleDesc tupDesc,
 	if (nonenull)
 		return RI_KEYS_NONE_NULL;
 
-	return RI_KEYS_SOME_NULL;
+	/* Should not happen. */
+	Assert(false);
 }
 
 
-- 
2.20.1

v02-0004-Introduce-infrastructure-for-batch-processing-RI-eve.patchtext/x-diffDownload
From 208c733d759592402901599446b4f7e7197c1777 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Fri, 5 Jun 2020 16:42:34 +0200
Subject: [PATCH 4/5] Introduce infrastructure for batch processing RI events.

Separate storage is used for the RI trigger events because the "transient
table" that we provide to statement triggers would not be available for
deferred constraints. Also, the regular statement level trigger is not ideal
for the RI checks because it requires the query execution to complete before
the RI checks even start. On the other hand, if we use batches of row trigger
events, we only need to tune the batch size so that user gets RI violation
error rather soon.

This patch only introduces the infrastructure, however the trigger function is
still called per event. This is just to reduce the size of the diffs.
---
 src/backend/commands/tablecmds.c    |   68 +-
 src/backend/commands/trigger.c      |  406 ++++++--
 src/backend/executor/spi.c          |   16 +-
 src/backend/utils/adt/ri_triggers.c | 1385 +++++++++++++++++++--------
 src/include/commands/trigger.h      |   25 +
 5 files changed, 1381 insertions(+), 519 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2ab02e01a0..8f4dd07bf7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -10326,6 +10326,15 @@ validateForeignKeyConstraint(char *conname,
 	MemoryContext oldcxt;
 	MemoryContext perTupCxt;
 
+	LOCAL_FCINFO(fcinfo, 0);
+	TriggerData trigdata = {0};
+	ResourceOwner saveResourceOwner;
+	Tuplestorestate *table;
+	TupleDesc	tupdesc;
+	const int16 *attnums;
+	Datum	   *values;
+	bool	   *isnull;
+
 	ereport(DEBUG1,
 			(errmsg("validating foreign key constraint \"%s\"", conname)));
 
@@ -10360,41 +10369,58 @@ validateForeignKeyConstraint(char *conname,
 	slot = table_slot_create(rel, NULL);
 	scan = table_beginscan(rel, snapshot, 0, NULL);
 
+	saveResourceOwner = CurrentResourceOwner;
+	CurrentResourceOwner = CurTransactionResourceOwner;
+	table = tuplestore_begin_heap(false, false, work_mem);
+	CurrentResourceOwner = saveResourceOwner;
+
+	/* Retrieve information on FK attributes. */
+	tupdesc = RI_FKey_fk_attributes(&trig, rel, &attnums);
+	values = (Datum *) palloc(tupdesc->natts * sizeof(Datum));
+	isnull = (bool *) palloc(tupdesc->natts * sizeof(bool));
+
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
 									  ALLOCSET_SMALL_SIZES);
 	oldcxt = MemoryContextSwitchTo(perTupCxt);
 
+	/* Store the rows to be checked, but only the FK attributes. */
 	while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 	{
-		LOCAL_FCINFO(fcinfo, 0);
-		TriggerData trigdata = {0};
+		int			i;
 
 		CHECK_FOR_INTERRUPTS();
 
-		/*
-		 * Make a call to the trigger function
-		 *
-		 * No parameters are passed, but we do set a context
-		 */
-		MemSet(fcinfo, 0, SizeForFunctionCallInfo(0));
-
-		/*
-		 * We assume RI_FKey_check_ins won't look at flinfo...
-		 */
-		trigdata.type = T_TriggerData;
-		trigdata.tg_event = TRIGGER_EVENT_INSERT | TRIGGER_EVENT_ROW;
-		trigdata.tg_relation = rel;
-		trigdata.tg_trigtuple = ExecFetchSlotHeapTuple(slot, false, NULL);
-		trigdata.tg_trigslot = slot;
-		trigdata.tg_trigger = &trig;
-
-		fcinfo->context = (Node *) &trigdata;
+		for (i = 0; i < slot->tts_tupleDescriptor->natts; i++)
+			values[i] = slot_getattr(slot, attnums[i], &isnull[i]);
 
-		RI_FKey_check_ins(fcinfo);
+		tuplestore_putvalues(table, tupdesc, values, isnull);
 
 		MemoryContextReset(perTupCxt);
 	}
+	pfree(values);
+	pfree(isnull);
+
+	/*
+	 * Make a call to the trigger function
+	 *
+	 * No parameters are passed, but we do set a context
+	 */
+	MemSet(fcinfo, 0, SizeForFunctionCallInfo(0));
+
+	/*
+	 * We assume RI_FKey_check_ins won't look at flinfo...
+	 */
+	trigdata.type = T_TriggerData;
+	trigdata.tg_event = TRIGGER_EVENT_INSERT | TRIGGER_EVENT_ROW;
+	trigdata.tg_relation = rel;
+	trigdata.tg_trigslot = slot;
+	trigdata.tg_trigger = &trig;
+	trigdata.tg_oldtable = table;
+
+	fcinfo->context = (Node *) &trigdata;
+
+	RI_FKey_check_ins(fcinfo);
 
 	MemoryContextSwitchTo(oldcxt);
 	MemoryContextDelete(perTupCxt);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 672fccff5b..c240988471 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -105,6 +105,8 @@ static void AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 static void AfterTriggerEnlargeQueryState(void);
 static bool before_stmt_triggers_fired(Oid relid, CmdType cmdType);
 
+static TIDArray *alloc_tid_array(void);
+static void add_tid(TIDArray *ta, ItemPointer item);
 
 /*
  * Create a trigger.  Returns the address of the created trigger.
@@ -3337,10 +3339,14 @@ typedef struct AfterTriggerEventList
 /* Macros to help in iterating over a list of events */
 #define for_each_chunk(cptr, evtlist) \
 	for (cptr = (evtlist).head; cptr != NULL; cptr = cptr->next)
+#define next_event_in_chunk(eptr, cptr) \
+	(AfterTriggerEvent) (((char *) eptr) + SizeofTriggerEvent(eptr))
 #define for_each_event(eptr, cptr) \
 	for (eptr = (AfterTriggerEvent) CHUNK_DATA_START(cptr); \
 		 (char *) eptr < (cptr)->freeptr; \
-		 eptr = (AfterTriggerEvent) (((char *) eptr) + SizeofTriggerEvent(eptr)))
+		 eptr = next_event_in_chunk(eptr, cptr))
+#define is_last_event_in_chunk(eptr, cptr) \
+	((((char *) eptr) + SizeofTriggerEvent(eptr)) >= (cptr)->freeptr)
 /* Use this if no special per-chunk processing is needed */
 #define for_each_event_chunk(eptr, cptr, evtlist) \
 	for_each_chunk(cptr, evtlist) for_each_event(eptr, cptr)
@@ -3488,9 +3494,17 @@ static void AfterTriggerExecute(EState *estate,
 								TriggerDesc *trigdesc,
 								FmgrInfo *finfo,
 								Instrumentation *instr,
+								TriggerData *trig_last,
 								MemoryContext per_tuple_context,
+								MemoryContext batch_context,
 								TupleTableSlot *trig_tuple_slot1,
 								TupleTableSlot *trig_tuple_slot2);
+static void AfterTriggerExecuteRI(EState *estate,
+								  ResultRelInfo *relInfo,
+								  FmgrInfo *finfo,
+								  Instrumentation *instr,
+								  TriggerData *trig_last,
+								  MemoryContext batch_context);
 static AfterTriggersTableData *GetAfterTriggersTableData(Oid relid,
 														 CmdType cmdType);
 static void AfterTriggerFreeQuery(AfterTriggersQueryData *qs);
@@ -3807,13 +3821,16 @@ afterTriggerDeleteHeadEventChunk(AfterTriggersQueryData *qs)
  *	fmgr lookup cache space at the caller level.  (For triggers fired at
  *	the end of a query, we can even piggyback on the executor's state.)
  *
- *	event: event currently being fired.
+ *	event: event currently being fired. Pass NULL if the current batch of RI
+ *		trigger events should be processed.
  *	rel: open relation for event.
  *	trigdesc: working copy of rel's trigger info.
  *	finfo: array of fmgr lookup cache entries (one per trigger in trigdesc).
  *	instr: array of EXPLAIN ANALYZE instrumentation nodes (one per trigger),
  *		or NULL if no instrumentation is wanted.
+ *	trig_last: trigger info used for the last trigger execution.
  *	per_tuple_context: memory context to call trigger function in.
+ *	batch_context: memory context to store tuples for RI triggers.
  *	trig_tuple_slot1: scratch slot for tg_trigtuple (foreign tables only)
  *	trig_tuple_slot2: scratch slot for tg_newtuple (foreign tables only)
  * ----------
@@ -3824,39 +3841,55 @@ AfterTriggerExecute(EState *estate,
 					ResultRelInfo *relInfo,
 					TriggerDesc *trigdesc,
 					FmgrInfo *finfo, Instrumentation *instr,
+					TriggerData *trig_last,
 					MemoryContext per_tuple_context,
+					MemoryContext batch_context,
 					TupleTableSlot *trig_tuple_slot1,
 					TupleTableSlot *trig_tuple_slot2)
 {
 	Relation	rel = relInfo->ri_RelationDesc;
 	AfterTriggerShared evtshared = GetTriggerSharedData(event);
 	Oid			tgoid = evtshared->ats_tgoid;
-	TriggerData LocTriggerData = {0};
 	HeapTuple	rettuple;
-	int			tgindx;
 	bool		should_free_trig = false;
 	bool		should_free_new = false;
+	bool		is_new = false;
 
-	/*
-	 * Locate trigger in trigdesc.
-	 */
-	for (tgindx = 0; tgindx < trigdesc->numtriggers; tgindx++)
+	if (trig_last->tg_trigger == NULL)
 	{
-		if (trigdesc->triggers[tgindx].tgoid == tgoid)
+		int			tgindx;
+
+		/*
+		 * Locate trigger in trigdesc.
+		 */
+		for (tgindx = 0; tgindx < trigdesc->numtriggers; tgindx++)
 		{
-			LocTriggerData.tg_trigger = &(trigdesc->triggers[tgindx]);
-			break;
+			if (trigdesc->triggers[tgindx].tgoid == tgoid)
+			{
+				trig_last->tg_trigger = &(trigdesc->triggers[tgindx]);
+				trig_last->tgindx = tgindx;
+				break;
+			}
 		}
+		if (trig_last->tg_trigger == NULL)
+			elog(ERROR, "could not find trigger %u", tgoid);
+
+		if (RI_FKey_trigger_type(trig_last->tg_trigger->tgfoid) !=
+			RI_TRIGGER_NONE)
+			trig_last->is_ri_trigger = true;
+
+		is_new = true;
 	}
-	if (LocTriggerData.tg_trigger == NULL)
-		elog(ERROR, "could not find trigger %u", tgoid);
+
+	/* trig_last for non-RI trigger should always be initialized again. */
+	Assert(trig_last->is_ri_trigger || is_new);
 
 	/*
 	 * If doing EXPLAIN ANALYZE, start charging time to this trigger. We want
 	 * to include time spent re-fetching tuples in the trigger cost.
 	 */
-	if (instr)
-		InstrStartNode(instr + tgindx);
+	if (instr && !trig_last->is_ri_trigger)
+		InstrStartNode(instr + trig_last->tgindx);
 
 	/*
 	 * Fetch the required tuple(s).
@@ -3864,6 +3897,9 @@ AfterTriggerExecute(EState *estate,
 	switch (event->ate_flags & AFTER_TRIGGER_TUP_BITS)
 	{
 		case AFTER_TRIGGER_FDW_FETCH:
+			/* Foreign keys are not supported on foreign tables. */
+			Assert(!trig_last->is_ri_trigger);
+
 			{
 				Tuplestorestate *fdw_tuplestore = GetCurrentFDWTuplestore();
 
@@ -3879,6 +3915,8 @@ AfterTriggerExecute(EState *estate,
 			}
 			/* fall through */
 		case AFTER_TRIGGER_FDW_REUSE:
+			/* Foreign keys are not supported on foreign tables. */
+			Assert(!trig_last->is_ri_trigger);
 
 			/*
 			 * Store tuple in the slot so that tg_trigtuple does not reference
@@ -3889,38 +3927,56 @@ AfterTriggerExecute(EState *estate,
 			 * that is stored as a heap tuple, constructed in different memory
 			 * context, in the slot anyway.
 			 */
-			LocTriggerData.tg_trigslot = trig_tuple_slot1;
-			LocTriggerData.tg_trigtuple =
+			trig_last->tg_trigslot = trig_tuple_slot1;
+			trig_last->tg_trigtuple =
 				ExecFetchSlotHeapTuple(trig_tuple_slot1, true, &should_free_trig);
 
 			if ((evtshared->ats_event & TRIGGER_EVENT_OPMASK) ==
 				TRIGGER_EVENT_UPDATE)
 			{
-				LocTriggerData.tg_newslot = trig_tuple_slot2;
-				LocTriggerData.tg_newtuple =
+				trig_last->tg_newslot = trig_tuple_slot2;
+				trig_last->tg_newtuple =
 					ExecFetchSlotHeapTuple(trig_tuple_slot2, true, &should_free_new);
 			}
 			else
 			{
-				LocTriggerData.tg_newtuple = NULL;
+				trig_last->tg_newtuple = NULL;
 			}
 			break;
 
 		default:
 			if (ItemPointerIsValid(&(event->ate_ctid1)))
 			{
-				LocTriggerData.tg_trigslot = ExecGetTriggerOldSlot(estate, relInfo);
+				if (!trig_last->is_ri_trigger)
+				{
+					trig_last->tg_trigslot = ExecGetTriggerOldSlot(estate,
+																   relInfo);
 
-				if (!table_tuple_fetch_row_version(rel, &(event->ate_ctid1),
-												   SnapshotAny,
-												   LocTriggerData.tg_trigslot))
-					elog(ERROR, "failed to fetch tuple1 for AFTER trigger");
-				LocTriggerData.tg_trigtuple =
-					ExecFetchSlotHeapTuple(LocTriggerData.tg_trigslot, false, &should_free_trig);
+					if (!table_tuple_fetch_row_version(rel, &(event->ate_ctid1),
+													   SnapshotAny,
+													   trig_last->tg_trigslot))
+						elog(ERROR, "failed to fetch tuple1 for AFTER trigger");
+
+					trig_last->tg_trigtuple =
+						ExecFetchSlotHeapTuple(trig_last->tg_trigslot, false,
+											   &should_free_trig);
+				}
+				else
+				{
+					if (trig_last->ri_tids_old == NULL)
+					{
+						MemoryContext oldcxt;
+
+						oldcxt = MemoryContextSwitchTo(batch_context);
+						trig_last->ri_tids_old = alloc_tid_array();
+						MemoryContextSwitchTo(oldcxt);
+					}
+					add_tid(trig_last->ri_tids_old, &(event->ate_ctid1));
+				}
 			}
 			else
 			{
-				LocTriggerData.tg_trigtuple = NULL;
+				trig_last->tg_trigtuple = NULL;
 			}
 
 			/* don't touch ctid2 if not there */
@@ -3928,18 +3984,36 @@ AfterTriggerExecute(EState *estate,
 				AFTER_TRIGGER_2CTID &&
 				ItemPointerIsValid(&(event->ate_ctid2)))
 			{
-				LocTriggerData.tg_newslot = ExecGetTriggerNewSlot(estate, relInfo);
+				if (!trig_last->is_ri_trigger)
+				{
+					trig_last->tg_newslot = ExecGetTriggerNewSlot(estate,
+																  relInfo);
 
-				if (!table_tuple_fetch_row_version(rel, &(event->ate_ctid2),
-												   SnapshotAny,
-												   LocTriggerData.tg_newslot))
-					elog(ERROR, "failed to fetch tuple2 for AFTER trigger");
-				LocTriggerData.tg_newtuple =
-					ExecFetchSlotHeapTuple(LocTriggerData.tg_newslot, false, &should_free_new);
+					if (!table_tuple_fetch_row_version(rel, &(event->ate_ctid2),
+													   SnapshotAny,
+													   trig_last->tg_newslot))
+						elog(ERROR, "failed to fetch tuple2 for AFTER trigger");
+
+					trig_last->tg_newtuple =
+						ExecFetchSlotHeapTuple(trig_last->tg_newslot, false,
+											   &should_free_new);
+				}
+				else
+				{
+					if (trig_last->ri_tids_new == NULL)
+					{
+						MemoryContext oldcxt;
+
+						oldcxt = MemoryContextSwitchTo(batch_context);
+						trig_last->ri_tids_new = alloc_tid_array();
+						MemoryContextSwitchTo(oldcxt);
+					}
+					add_tid(trig_last->ri_tids_new, &(event->ate_ctid2));
+				}
 			}
 			else
 			{
-				LocTriggerData.tg_newtuple = NULL;
+				trig_last->tg_newtuple = NULL;
 			}
 	}
 
@@ -3949,19 +4023,26 @@ AfterTriggerExecute(EState *estate,
 	 * a trigger, mark it "closed" so that it cannot change anymore.  If any
 	 * additional events of the same type get queued in the current trigger
 	 * query level, they'll go into new transition tables.
+	 *
+	 * RI triggers treat the tuplestores specially, see above.
 	 */
-	LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL;
+	if (!trig_last->is_ri_trigger)
+		trig_last->tg_oldtable = trig_last->tg_newtable = NULL;
+
 	if (evtshared->ats_table)
 	{
-		if (LocTriggerData.tg_trigger->tgoldtable)
+		/* There shouldn't be any transition table for an RI trigger event. */
+		Assert(!trig_last->is_ri_trigger);
+
+		if (trig_last->tg_trigger->tgoldtable)
 		{
-			LocTriggerData.tg_oldtable = evtshared->ats_table->old_tuplestore;
+			trig_last->tg_oldtable = evtshared->ats_table->old_tuplestore;
 			evtshared->ats_table->closed = true;
 		}
 
-		if (LocTriggerData.tg_trigger->tgnewtable)
+		if (trig_last->tg_trigger->tgnewtable)
 		{
-			LocTriggerData.tg_newtable = evtshared->ats_table->new_tuplestore;
+			trig_last->tg_newtable = evtshared->ats_table->new_tuplestore;
 			evtshared->ats_table->closed = true;
 		}
 	}
@@ -3969,54 +4050,139 @@ AfterTriggerExecute(EState *estate,
 	/*
 	 * Setup the remaining trigger information
 	 */
-	LocTriggerData.type = T_TriggerData;
-	LocTriggerData.tg_event =
-		evtshared->ats_event & (TRIGGER_EVENT_OPMASK | TRIGGER_EVENT_ROW);
-	LocTriggerData.tg_relation = rel;
-	if (TRIGGER_FOR_UPDATE(LocTriggerData.tg_trigger->tgtype))
-		LocTriggerData.tg_updatedcols = evtshared->ats_modifiedcols;
-
-	MemoryContextReset(per_tuple_context);
+	if (is_new)
+	{
+		trig_last->type = T_TriggerData;
+		trig_last->tg_event =
+			evtshared->ats_event & (TRIGGER_EVENT_OPMASK | TRIGGER_EVENT_ROW);
+		trig_last->tg_relation = rel;
+		if (TRIGGER_FOR_UPDATE(trig_last->tg_trigger->tgtype))
+			trig_last->tg_updatedcols = evtshared->ats_modifiedcols;
+	}
 
 	/*
-	 * Call the trigger and throw away any possibly returned updated tuple.
-	 * (Don't let ExecCallTriggerFunc measure EXPLAIN time.)
+	 * RI triggers are executed in batches, see the top of the function.
 	 */
-	rettuple = ExecCallTriggerFunc(&LocTriggerData,
-								   tgindx,
-								   finfo,
-								   NULL,
-								   per_tuple_context);
-	if (rettuple != NULL &&
-		rettuple != LocTriggerData.tg_trigtuple &&
-		rettuple != LocTriggerData.tg_newtuple)
-		heap_freetuple(rettuple);
+	if (!trig_last->is_ri_trigger)
+	{
+		MemoryContextReset(per_tuple_context);
+
+		/*
+		 * Call the trigger and throw away any possibly returned updated
+		 * tuple. (Don't let ExecCallTriggerFunc measure EXPLAIN time.)
+		 */
+		rettuple = ExecCallTriggerFunc(trig_last,
+									   trig_last->tgindx,
+									   finfo,
+									   NULL,
+									   per_tuple_context);
+		if (rettuple != NULL &&
+			rettuple != trig_last->tg_trigtuple &&
+			rettuple != trig_last->tg_newtuple)
+			heap_freetuple(rettuple);
+	}
 
 	/*
 	 * Release resources
 	 */
 	if (should_free_trig)
-		heap_freetuple(LocTriggerData.tg_trigtuple);
+		heap_freetuple(trig_last->tg_trigtuple);
 	if (should_free_new)
-		heap_freetuple(LocTriggerData.tg_newtuple);
+		heap_freetuple(trig_last->tg_newtuple);
 
-	/* don't clear slots' contents if foreign table */
-	if (trig_tuple_slot1 == NULL)
+	/*
+	 * Don't clear slots' contents if foreign table.
+	 *
+	 * For for RI trigger we manage these slots separately, see
+	 * AfterTriggerExecuteRI().
+	 */
+	if (trig_tuple_slot1 == NULL && !trig_last->is_ri_trigger)
 	{
-		if (LocTriggerData.tg_trigslot)
-			ExecClearTuple(LocTriggerData.tg_trigslot);
-		if (LocTriggerData.tg_newslot)
-			ExecClearTuple(LocTriggerData.tg_newslot);
+		if (trig_last->tg_trigslot)
+			ExecClearTuple(trig_last->tg_trigslot);
+		if (trig_last->tg_newslot)
+			ExecClearTuple(trig_last->tg_newslot);
 	}
 
 	/*
 	 * If doing EXPLAIN ANALYZE, stop charging time to this trigger, and count
 	 * one "tuple returned" (really the number of firings).
 	 */
-	if (instr)
-		InstrStopNode(instr + tgindx, 1);
+	if (instr && !trig_last->is_ri_trigger)
+		InstrStopNode(instr + trig_last->tgindx, 1);
+
+	/* RI triggers use trig_last across calls. */
+	if (!trig_last->is_ri_trigger)
+		memset(trig_last, 0, sizeof(TriggerData));
 }
 
+/*
+ * AfterTriggerExecuteRI()
+ *
+ * Execute an RI trigger. It's assumed that AfterTriggerExecute() recognized
+ * RI trigger events and only added them to the batch instead of executing
+ * them. The actual processing of the batch is done by this function.
+ */
+static void
+AfterTriggerExecuteRI(EState *estate,
+					  ResultRelInfo *relInfo,
+					  FmgrInfo *finfo,
+					  Instrumentation *instr,
+					  TriggerData *trig_last,
+					  MemoryContext batch_context)
+{
+	HeapTuple	rettuple;
+
+	/*
+	 * AfterTriggerExecute() must have been called for this trigger already.
+	 */
+	Assert(trig_last->tg_trigger);
+	Assert(trig_last->is_ri_trigger);
+
+	/*
+	 * RI trigger constructs a local tuplestore when it needs it. The point is
+	 * that it might need to check visibility first. If we put the tuples into
+	 * a tuplestore now, it'd be hard to keep pins of the containing buffers,
+	 * and so table_tuple_satisfies_snapshot check wouldn't work.
+	 */
+	Assert(trig_last->tg_oldtable == NULL);
+	Assert(trig_last->tg_newtable == NULL);
+
+	/* Initialize the slots to retrieve the rows by TID. */
+	trig_last->tg_trigslot = ExecGetTriggerOldSlot(estate, relInfo);
+	trig_last->tg_newslot = ExecGetTriggerNewSlot(estate, relInfo);
+
+	if (instr)
+		InstrStartNode(instr + trig_last->tgindx);
+
+	/*
+	 * Call the trigger and throw away any possibly returned updated tuple.
+	 * (Don't let ExecCallTriggerFunc measure EXPLAIN time.)
+	 *
+	 * batch_context already contains the TIDs of the affected rows. The RI
+	 * trigger should also use this context to create the tuplestore for them.
+	 */
+	rettuple = ExecCallTriggerFunc(trig_last,
+								   trig_last->tgindx,
+								   finfo,
+								   NULL,
+								   batch_context);
+	if (rettuple != NULL &&
+		rettuple != trig_last->tg_trigtuple &&
+		rettuple != trig_last->tg_newtuple)
+		heap_freetuple(rettuple);
+
+	if (instr)
+		InstrStopNode(instr + trig_last->tgindx, 1);
+
+	ExecClearTuple(trig_last->tg_trigslot);
+	ExecClearTuple(trig_last->tg_newslot);
+
+	MemoryContextReset(batch_context);
+
+	memset(trig_last, 0, sizeof(TriggerData));
+	return;
+}
 
 /*
  * afterTriggerMarkEvents()
@@ -4112,7 +4278,8 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 {
 	bool		all_fired = true;
 	AfterTriggerEventChunk *chunk;
-	MemoryContext per_tuple_context;
+	MemoryContext per_tuple_context,
+				batch_context;
 	bool		local_estate = false;
 	ResultRelInfo *rInfo = NULL;
 	Relation	rel = NULL;
@@ -4121,6 +4288,7 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 	Instrumentation *instr = NULL;
 	TupleTableSlot *slot1 = NULL,
 			   *slot2 = NULL;
+	TriggerData trig_last;
 
 	/* Make a local EState if need be */
 	if (estate == NULL)
@@ -4134,6 +4302,14 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 		AllocSetContextCreate(CurrentMemoryContext,
 							  "AfterTriggerTupleContext",
 							  ALLOCSET_DEFAULT_SIZES);
+	/* Separate context for a batch of RI trigger events. */
+	batch_context =
+		AllocSetContextCreate(CurrentMemoryContext,
+							  "AfterTriggerBatchContext",
+							  ALLOCSET_DEFAULT_SIZES);
+
+	/* No trigger executed yet in this batch. */
+	memset(&trig_last, 0, sizeof(TriggerData));
 
 	for_each_chunk(chunk, *events)
 	{
@@ -4150,6 +4326,8 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 			if ((event->ate_flags & AFTER_TRIGGER_IN_PROGRESS) &&
 				evtshared->ats_firing_id == firing_id)
 			{
+				bool		fire_ri_batch = false;
+
 				/*
 				 * So let's fire it... but first, find the correct relation if
 				 * this is not the same relation as before.
@@ -4180,12 +4358,60 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 				}
 
 				/*
-				 * Fire it.  Note that the AFTER_TRIGGER_IN_PROGRESS flag is
-				 * still set, so recursive examinations of the event list
-				 * won't try to re-fire it.
+				 * Fire it (or add the corresponding tuple(s) to the current
+				 * batch if it's RI trigger).
+				 *
+				 * Note that the AFTER_TRIGGER_IN_PROGRESS flag is still set,
+				 * so recursive examinations of the event list won't try to
+				 * re-fire it.
 				 */
-				AfterTriggerExecute(estate, event, rInfo, trigdesc, finfo, instr,
-									per_tuple_context, slot1, slot2);
+				AfterTriggerExecute(estate, event, rInfo, trigdesc, finfo,
+									instr, &trig_last,
+									per_tuple_context, batch_context,
+									slot1, slot2);
+
+				/*
+				 * RI trigger events are processed in batches, so extra work
+				 * might be needed to finish the current batch. It's important
+				 * to do this before the chunk iteration ends because the
+				 * trigger execution may generate other events.
+				 *
+				 * XXX Implement maximum batch size so that constraint
+				 * violations are reported as soon as possible?
+				 */
+				if (trig_last.tg_trigger && trig_last.is_ri_trigger)
+				{
+					if (is_last_event_in_chunk(event, chunk))
+						fire_ri_batch = true;
+					else
+					{
+						AfterTriggerEvent evtnext;
+						AfterTriggerShared evtshnext;
+
+						/*
+						 * We even need to look ahead because the next event
+						 * might be affected by execution of the current one.
+						 * For example if the next event is an AS trigger
+						 * event to be cancelled (cancel_prior_stmt_triggers)
+						 * because the current event, during its execution,
+						 * generates a new AS event for the same trigger.
+						 */
+						evtnext = next_event_in_chunk(event, chunk);
+						evtshnext = GetTriggerSharedData(evtnext);
+
+						if (evtshnext != evtshared)
+							fire_ri_batch = true;
+					}
+				}
+
+				if (fire_ri_batch)
+					AfterTriggerExecuteRI(estate,
+										  rInfo,
+										  finfo,
+										  instr,
+										  &trig_last,
+										  batch_context);
+
 
 				/*
 				 * Mark the event as done.
@@ -4216,6 +4442,7 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 				events->tailfree = chunk->freeptr;
 		}
 	}
+
 	if (slot1 != NULL)
 	{
 		ExecDropSingleTupleTableSlot(slot1);
@@ -4224,6 +4451,7 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 
 	/* Release working resources */
 	MemoryContextDelete(per_tuple_context);
+	MemoryContextDelete(batch_context);
 
 	if (local_estate)
 	{
@@ -5812,3 +6040,29 @@ pg_trigger_depth(PG_FUNCTION_ARGS)
 {
 	PG_RETURN_INT32(MyTriggerDepth);
 }
+
+static TIDArray *
+alloc_tid_array(void)
+{
+	TIDArray   *result = (TIDArray *) palloc(sizeof(TIDArray));
+
+	/* XXX Tune the chunk size. */
+	result->nmax = 1024;
+	result->tids = (ItemPointer) palloc(result->nmax *
+										sizeof(ItemPointerData));
+	result->n = 0;
+	return result;
+}
+
+static void
+add_tid(TIDArray *ta, ItemPointer item)
+{
+	if (ta->n == ta->nmax)
+	{
+		ta->nmax += 1024;
+		ta->tids = (ItemPointer) repalloc(ta->tids,
+										  ta->nmax * sizeof(ItemPointerData));
+	}
+	memcpy(ta->tids + ta->n, item, sizeof(ItemPointerData));
+	ta->n++;
+}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index b108168821..37026219b6 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2929,12 +2929,14 @@ SPI_register_trigger_data(TriggerData *tdata)
 	if (tdata->tg_newtable)
 	{
 		EphemeralNamedRelation enr =
-		palloc(sizeof(EphemeralNamedRelationData));
+		palloc0(sizeof(EphemeralNamedRelationData));
 		int			rc;
 
 		enr->md.name = tdata->tg_trigger->tgnewtable;
-		enr->md.reliddesc = tdata->tg_relation->rd_id;
-		enr->md.tupdesc = NULL;
+		if (tdata->desc)
+			enr->md.tupdesc = tdata->desc;
+		else
+			enr->md.reliddesc = tdata->tg_relation->rd_id;
 		enr->md.enrtype = ENR_NAMED_TUPLESTORE;
 		enr->md.enrtuples = tuplestore_tuple_count(tdata->tg_newtable);
 		enr->reldata = tdata->tg_newtable;
@@ -2946,12 +2948,14 @@ SPI_register_trigger_data(TriggerData *tdata)
 	if (tdata->tg_oldtable)
 	{
 		EphemeralNamedRelation enr =
-		palloc(sizeof(EphemeralNamedRelationData));
+		palloc0(sizeof(EphemeralNamedRelationData));
 		int			rc;
 
 		enr->md.name = tdata->tg_trigger->tgoldtable;
-		enr->md.reliddesc = tdata->tg_relation->rd_id;
-		enr->md.tupdesc = NULL;
+		if (tdata->desc)
+			enr->md.tupdesc = tdata->desc;
+		else
+			enr->md.reliddesc = tdata->tg_relation->rd_id;
 		enr->md.enrtype = ENR_NAMED_TUPLESTORE;
 		enr->md.enrtuples = tuplestore_tuple_count(tdata->tg_oldtable);
 		enr->reldata = tdata->tg_oldtable;
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 647b102be1..44d1e12a81 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -114,6 +114,10 @@ typedef struct RI_ConstraintInfo
 	Oid			pf_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = FK) */
 	Oid			pp_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = PK) */
 	Oid			ff_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (FK = FK) */
+	TupleTableSlot *slot_pk;	/* slot for PK attributes */
+	TupleTableSlot *slot_fk;	/* slot for FK attributes */
+	TupleTableSlot *slot_both;	/* Both OLD an NEW version of PK table row. */
+	MemoryContext slot_mcxt;	/* the slots will exist in this context  */
 	dlist_node	valid_link;		/* Link in list of valid entries */
 } RI_ConstraintInfo;
 
@@ -173,11 +177,29 @@ static int	ri_constraint_cache_valid_count = 0;
 /*
  * Local function prototypes
  */
+static char *RI_FKey_check_query_single_row(const RI_ConstraintInfo *riinfo,
+											Relation fk_rel, Relation pk_rel,
+											Oid *paramtypes);
+static bool RI_FKey_check_query_required(Trigger *trigger, Relation fk_rel,
+										 TupleTableSlot *newslot);
 static bool ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 							  TupleTableSlot *oldslot,
 							  const RI_ConstraintInfo *riinfo);
-static Datum ri_restrict(TriggerData *trigdata, bool is_no_action);
+static Datum ri_restrict(TriggerData *trigdata, bool is_no_action,
+						 TupleTableSlot *oldslot);
+static char *ri_restrict_query_single_row(const RI_ConstraintInfo *riinfo,
+										  Relation fk_rel,
+										  Relation pk_rel, Oid *paramtypes);
+static char *ri_cascade_del_query_single_row(const RI_ConstraintInfo *riinfo,
+											 Relation fk_rel, Relation pk_rel,
+											 Oid *paramtypes);
+static char *ri_cascade_upd_query_single_row(const RI_ConstraintInfo *riinfo,
+											 Relation fk_rel, Relation pk_rel,
+											 Oid *paramtypes);
 static Datum ri_set(TriggerData *trigdata, bool is_set_null);
+static char *ri_set_query_single_row(const RI_ConstraintInfo *riinfo,
+									 Relation fk_rel, Relation pk_rel,
+									 Oid *paramtypes, bool is_set_null);
 static void quoteOneName(char *buffer, const char *name);
 static void quoteRelationName(char *buffer, Relation rel);
 static char *ri_ColNameQuoted(const char *tabname, const char *attname);
@@ -200,6 +222,7 @@ static void ri_GenerateQual(StringInfo buf, char *sep, int nkeys,
 							const char *rtabname, Relation rrel,
 							const int16 *rattnums, const Oid *eq_oprs,
 							GenQualParams params, Oid *paramtypes);
+
 static void ri_GenerateQualComponent(StringInfo buf,
 									 const char *sep,
 									 const char *leftop, Oid leftoptype,
@@ -207,7 +230,8 @@ static void ri_GenerateQualComponent(StringInfo buf,
 									 const char *rightop, Oid rightoptype);
 static void ri_GenerateQualCollation(StringInfo buf, Oid collation);
 static int	ri_NullCheck(TupleDesc tupdesc, TupleTableSlot *slot,
-						 const RI_ConstraintInfo *riinfo, bool rel_is_pk);
+						 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
+						 bool ignore_attnums);
 static void ri_BuildQueryKey(RI_QueryKey *key,
 							 const RI_ConstraintInfo *riinfo,
 							 int32 constr_queryno);
@@ -226,22 +250,36 @@ static void ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname,
 							int tgkind);
 static const RI_ConstraintInfo *ri_FetchConstraintInfo(Trigger *trigger,
 													   Relation trig_rel, bool rel_is_pk);
-static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid);
+static const RI_ConstraintInfo *ri_LoadConstraintInfo(Oid constraintOid,
+													  Relation trig_rel,
+													  bool rel_is_pk);
 static SPIPlanPtr ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-							   RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel);
+							   RI_QueryKey *qkey,
+							   Relation fk_rel, Relation pk_rel);
 static bool ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 							RI_QueryKey *qkey, SPIPlanPtr qplan,
 							Relation fk_rel, Relation pk_rel,
-							TupleTableSlot *oldslot, TupleTableSlot *newslot,
+							TupleTableSlot *oldslot,
 							bool detectNewRows, int expect_OK);
-static void ri_ExtractValues(Relation rel, TupleTableSlot *slot,
-							 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
+static void ri_ExtractValues(TupleTableSlot *slot, int first, int nkeys,
 							 Datum *vals, char *nulls);
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
-
+static Tuplestorestate *get_event_tuplestore(TriggerData *trigdata, int nkeys,
+											 const int16 *attnums, bool old,
+											 TupleDesc tupdesc, Snapshot snapshot);
+static Tuplestorestate *get_event_tuplestore_for_cascade_update(TriggerData *trigdata,
+																const RI_ConstraintInfo *riinfo);
+static void add_key_attrs_to_tupdesc(TupleDesc tupdesc, Relation rel,
+									 const RI_ConstraintInfo *riinfo, int16 *attnums,
+									 int first, bool generate_attnames);
+static void add_key_values(TupleTableSlot *slot,
+						   const RI_ConstraintInfo *riinfo,
+						   Relation rel, ItemPointer ip,
+						   Datum *key_values, bool *key_nulls,
+						   Datum *values, bool *nulls, int first);
 
 /*
  * RI_FKey_check -
@@ -254,29 +292,17 @@ RI_FKey_check(TriggerData *trigdata)
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
-	TupleTableSlot *newslot;
+	bool		is_insert;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
+	Tuplestorestate *oldtable = NULL;
+	Tuplestorestate *newtable = NULL;
+	Tuplestorestate *table;
+	TupleTableSlot *slot = NULL;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
 
-	if (TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
-		newslot = trigdata->tg_newslot;
-	else
-		newslot = trigdata->tg_trigslot;
-
-	/*
-	 * We should not even consider checking the row if it is no longer valid,
-	 * since it was either deleted (so the deferred check should be skipped)
-	 * or updated (in which case only the latest version of the row should be
-	 * checked).  Test its liveness according to SnapshotSelf.  We need pin
-	 * and lock on the buffer to call HeapTupleSatisfiesVisibility.  Caller
-	 * should be holding pin, but not lock.
-	 */
-	if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot, SnapshotSelf))
-		return PointerGetDatum(NULL);
-
 	/*
 	 * Get the relation descriptors of the FK and PK tables.
 	 *
@@ -286,7 +312,142 @@ RI_FKey_check(TriggerData *trigdata)
 	fk_rel = trigdata->tg_relation;
 	pk_rel = table_open(riinfo->pk_relid, RowShareLock);
 
-	switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false))
+	if (SPI_connect() != SPI_OK_CONNECT)
+		elog(ERROR, "SPI_connect failed");
+
+	/* Fetch or prepare a saved plan for the real check */
+	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
+	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+	{
+		char	   *query;
+		Oid			paramtypes[RI_MAX_NUMKEYS];
+
+		query = RI_FKey_check_query_single_row(riinfo, fk_rel, pk_rel,
+											   paramtypes);
+
+		/* Prepare and save the plan */
+		qplan = ri_PlanCheck(query, riinfo->nkeys, paramtypes, &qkey, fk_rel,
+							 pk_rel);
+	}
+
+	/*
+	 * Retrieve the changed rows and put them into the appropriate tuplestore.
+	 */
+	is_insert = TRIGGER_FIRED_BY_INSERT(trigdata->tg_event);
+	if (is_insert)
+	{
+		if (trigdata->ri_tids_old)
+			oldtable = get_event_tuplestore(trigdata,
+											riinfo->nkeys,
+											riinfo->fk_attnums,
+											true,
+											riinfo->slot_fk->tts_tupleDescriptor,
+											SnapshotSelf);
+		else
+		{
+			/* The table is passed by caller if not called from trigger.c */
+			oldtable = trigdata->tg_oldtable;
+		}
+		table = oldtable;
+	}
+	else
+	{
+		Assert((TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event)));
+
+		if (trigdata->ri_tids_new)
+			newtable = get_event_tuplestore(trigdata,
+											riinfo->nkeys,
+											riinfo->fk_attnums,
+											false,
+											riinfo->slot_fk->tts_tupleDescriptor,
+											SnapshotSelf);
+		else
+		{
+			/* The table is passed by caller if not called from trigger.c */
+			newtable = trigdata->tg_newtable;
+		}
+		table = newtable;
+	}
+
+	/*
+	 * Retrieve and check the inserted / updated rows, one after another.
+	 */
+	slot = riinfo->slot_fk;
+	while (tuplestore_gettupleslot(table, true, false, slot))
+	{
+		if (!ri_PerformCheck(riinfo, &qkey, qplan,
+							 fk_rel, pk_rel,
+							 slot,
+							 false,
+							 SPI_OK_SELECT))
+			ri_ReportViolation(riinfo,
+							   pk_rel, fk_rel,
+							   slot,
+							   NULL,
+							   qkey.constr_queryno, false);
+	}
+
+	if (SPI_finish() != SPI_OK_FINISH)
+		elog(ERROR, "SPI_finish failed");
+
+	table_close(pk_rel, RowShareLock);
+
+	return PointerGetDatum(NULL);
+}
+
+/* ----------
+ * Like RI_FKey_check_query(), but check a single row.
+ *
+ * The query string built is
+ *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
+ *		   FOR KEY SHARE OF x
+ * The type id's for the $ parameters are those of the
+ * corresponding FK attributes.
+ *
+ * The query is quite a bit simpler than the one for bulk processing, and so
+ * it should execute faster.
+ *
+ * "paramtypes" will receive types of the query parameters (FK attributes).
+ * ----------
+ */
+static char *
+RI_FKey_check_query_single_row(const RI_ConstraintInfo *riinfo,
+							   Relation fk_rel, Relation pk_rel,
+							   Oid *paramtypes)
+{
+	StringInfo	querybuf = makeStringInfo();
+	const char *pk_only;
+	char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
+
+	pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+		"" : "ONLY ";
+	quoteRelationName(pkrelname, pk_rel);
+	appendStringInfo(querybuf, "SELECT 1 FROM %s%s p WHERE ",
+					 pk_only, pkrelname);
+	ri_GenerateQual(querybuf, "AND", riinfo->nkeys,
+					NULL, pk_rel, riinfo->pk_attnums,
+					NULL, fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs,
+					GQ_PARAMS_RIGHT, paramtypes);
+	appendStringInfoString(querybuf, " FOR KEY SHARE OF p");
+
+	return querybuf->data;
+}
+
+/*
+ * Check if the PK table needs to be queried (using the query generated by
+ * RI_FKey_check_query).
+ */
+static bool
+RI_FKey_check_query_required(Trigger *trigger, Relation fk_rel,
+							 TupleTableSlot *newslot)
+{
+	const RI_ConstraintInfo *riinfo;
+
+	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
+
+	switch (ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false,
+						 false))
 	{
 		case RI_KEYS_ALL_NULL:
 
@@ -294,8 +455,7 @@ RI_FKey_check(TriggerData *trigdata)
 			 * No further check needed - an all-NULL key passes every type of
 			 * foreign key constraint.
 			 */
-			table_close(pk_rel, RowShareLock);
-			return PointerGetDatum(NULL);
+			return false;
 
 		case RI_KEYS_SOME_NULL:
 
@@ -319,8 +479,7 @@ RI_FKey_check(TriggerData *trigdata)
 							 errdetail("MATCH FULL does not allow mixing of null and nonnull key values."),
 							 errtableconstraint(fk_rel,
 												NameStr(riinfo->conname))));
-					table_close(pk_rel, RowShareLock);
-					return PointerGetDatum(NULL);
+					break;
 
 				case FKCONSTR_MATCH_SIMPLE:
 
@@ -328,17 +487,16 @@ RI_FKey_check(TriggerData *trigdata)
 					 * MATCH SIMPLE - if ANY column is null, the key passes
 					 * the constraint.
 					 */
-					table_close(pk_rel, RowShareLock);
-					return PointerGetDatum(NULL);
+					return false;
 
 #ifdef NOT_USED
 				case FKCONSTR_MATCH_PARTIAL:
 
 					/*
 					 * MATCH PARTIAL - all non-null columns must match. (not
-					 * implemented, can be done by modifying the query below
-					 * to only include non-null columns, or by writing a
-					 * special version here)
+					 * implemented, can be done by modifying the query to only
+					 * include non-null columns, or by writing a special
+					 * version)
 					 */
 					break;
 #endif
@@ -347,71 +505,12 @@ RI_FKey_check(TriggerData *trigdata)
 		case RI_KEYS_NONE_NULL:
 
 			/*
-			 * Have a full qualified key - continue below for all three kinds
-			 * of MATCH.
+			 * Have a full qualified key - regular check is needed.
 			 */
 			break;
 	}
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		StringInfoData querybuf;
-		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *pk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <pktable> x WHERE pkatt1 = $1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding FK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s p WHERE ",
-						 pk_only, pkrelname);
-		ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
-						NULL, pk_rel, riinfo->pk_attnums,
-						NULL, fk_rel, riinfo->fk_attnums,
-						riinfo->pf_eq_oprs,
-						GQ_PARAMS_RIGHT, queryoids);
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF p");
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
-	}
-
-	/*
-	 * Now check that foreign key exists in PK table
-	 */
-	if (!ri_PerformCheck(riinfo, &qkey, qplan,
-						 fk_rel, pk_rel,
-						 NULL, newslot,
-						 false,
-						 SPI_OK_SELECT))
-		ri_ReportViolation(riinfo,
-						   pk_rel, fk_rel,
-						   newslot,
-						   NULL,
-						   qkey.constr_queryno, false);
-
-	if (SPI_finish() != SPI_OK_FINISH)
-		elog(ERROR, "SPI_finish failed");
-
-	table_close(pk_rel, RowShareLock);
-
-	return PointerGetDatum(NULL);
+	return true;
 }
 
 
@@ -467,7 +566,8 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 	bool		result;
 
 	/* Only called for non-null rows */
-	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) == RI_KEYS_NONE_NULL);
+	Assert(ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true,
+						true) == RI_KEYS_NONE_NULL);
 
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
@@ -480,10 +580,10 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
+		StringInfo	querybuf = makeStringInfo();
 		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
 		const char *pk_only;
-		Oid			queryoids[RI_MAX_NUMKEYS];
+		Oid			paramtypes[RI_MAX_NUMKEYS];
 
 		/* ----------
 		 * The query string built is
@@ -493,23 +593,23 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 		 * PK attributes themselves.
 		 * ----------
 		 */
-		initStringInfo(&querybuf);
 		pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 			"" : "ONLY ";
 		quoteRelationName(pkrelname, pk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x WHERE ",
+		appendStringInfo(querybuf, "SELECT 1 FROM %s%s x WHERE ",
 						 pk_only, pkrelname);
 
-		ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
+		ri_GenerateQual(querybuf, "AND", riinfo->nkeys,
 						NULL, pk_rel, riinfo->pk_attnums,
 						NULL, fk_rel, riinfo->fk_attnums,
 						riinfo->pf_eq_oprs,
 						GQ_PARAMS_RIGHT,
-						queryoids);
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
+						paramtypes);
+
+		appendStringInfoString(querybuf, " FOR KEY SHARE OF x");
 
 		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
+		qplan = ri_PlanCheck(querybuf->data, riinfo->nkeys, paramtypes,
 							 &qkey, fk_rel, pk_rel);
 	}
 
@@ -518,7 +618,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 	 */
 	result = ri_PerformCheck(riinfo, &qkey, qplan,
 							 fk_rel, pk_rel,
-							 oldslot, NULL,
+							 oldslot,
 							 true,	/* treat like update */
 							 SPI_OK_SELECT);
 
@@ -543,7 +643,7 @@ RI_FKey_noaction_del(PG_FUNCTION_ARGS)
 	ri_CheckTrigger(fcinfo, "RI_FKey_noaction_del", RI_TRIGTYPE_DELETE);
 
 	/* Share code with RESTRICT/UPDATE cases. */
-	return ri_restrict((TriggerData *) fcinfo->context, true);
+	return ri_restrict((TriggerData *) fcinfo->context, true, NULL);
 }
 
 /*
@@ -563,7 +663,7 @@ RI_FKey_restrict_del(PG_FUNCTION_ARGS)
 	ri_CheckTrigger(fcinfo, "RI_FKey_restrict_del", RI_TRIGTYPE_DELETE);
 
 	/* Share code with NO ACTION/UPDATE cases. */
-	return ri_restrict((TriggerData *) fcinfo->context, false);
+	return ri_restrict((TriggerData *) fcinfo->context, false, NULL);
 }
 
 /*
@@ -580,7 +680,7 @@ RI_FKey_noaction_upd(PG_FUNCTION_ARGS)
 	ri_CheckTrigger(fcinfo, "RI_FKey_noaction_upd", RI_TRIGTYPE_UPDATE);
 
 	/* Share code with RESTRICT/DELETE cases. */
-	return ri_restrict((TriggerData *) fcinfo->context, true);
+	return ri_restrict((TriggerData *) fcinfo->context, true, NULL);
 }
 
 /*
@@ -600,7 +700,7 @@ RI_FKey_restrict_upd(PG_FUNCTION_ARGS)
 	ri_CheckTrigger(fcinfo, "RI_FKey_restrict_upd", RI_TRIGTYPE_UPDATE);
 
 	/* Share code with NO ACTION/DELETE cases. */
-	return ri_restrict((TriggerData *) fcinfo->context, false);
+	return ri_restrict((TriggerData *) fcinfo->context, false, NULL);
 }
 
 /*
@@ -608,16 +708,20 @@ RI_FKey_restrict_upd(PG_FUNCTION_ARGS)
  *
  * Common code for ON DELETE RESTRICT, ON DELETE NO ACTION,
  * ON UPDATE RESTRICT, and ON UPDATE NO ACTION.
+ *
+ * If NULL is passed for oldslot, retrieve the rows from
+ * trigdata->ri_tids_old.
  */
 static Datum
-ri_restrict(TriggerData *trigdata, bool is_no_action)
+ri_restrict(TriggerData *trigdata, bool is_no_action, TupleTableSlot *oldslot)
 {
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
-	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
+	Tuplestorestate *oldtable = NULL;
+	bool		first_tuple;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -630,79 +734,76 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	 */
 	fk_rel = table_open(riinfo->fk_relid, RowShareLock);
 	pk_rel = trigdata->tg_relation;
-	oldslot = trigdata->tg_trigslot;
-
-	/*
-	 * If another PK row now exists providing the old key values, we should
-	 * not do anything.  However, this check should only be made in the NO
-	 * ACTION case; in RESTRICT cases we don't wish to allow another row to be
-	 * substituted.
-	 */
-	if (is_no_action &&
-		ri_Check_Pk_Match(pk_rel, fk_rel, oldslot, riinfo))
-	{
-		table_close(fk_rel, RowShareLock);
-		return PointerGetDatum(NULL);
-	}
 
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
 
-	/*
-	 * Fetch or prepare a saved plan for the restrict lookup (it's the same
-	 * query for delete and update cases)
-	 */
+	/* Fetch or prepare a saved plan for the real check */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_RESTRICT_CHECKREF);
-
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *fk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	SELECT 1 FROM [ONLY] <fktable> x WHERE $1 = fkatt1 [AND ...]
-		 *		   FOR KEY SHARE OF x
-		 * The type id's for the $ parameters are those of the
-		 * corresponding PK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(fkrelname, fk_rel);
-		appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x WHERE ",
-						 fk_only, fkrelname);
-
-		ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
-						NULL, pk_rel, riinfo->pk_attnums,
-						NULL, fk_rel, riinfo->fk_attnums,
-						riinfo->pf_eq_oprs,
-						GQ_PARAMS_LEFT,
-						queryoids);
+		char	   *query;
+		Oid			paramtypes[RI_MAX_NUMKEYS];
 
-		appendStringInfoString(&querybuf, " FOR KEY SHARE OF x");
+		query = ri_restrict_query_single_row(riinfo, fk_rel, pk_rel,
+											 paramtypes);
 
 		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
+		qplan = ri_PlanCheck(query, riinfo->nkeys, paramtypes, &qkey,
+							 fk_rel, pk_rel);
 	}
 
+	if (oldslot == NULL)
+	{
+		oldtable = get_event_tuplestore(trigdata,
+										riinfo->nkeys,
+										riinfo->pk_attnums,
+										true,
+										riinfo->slot_pk->tts_tupleDescriptor,
+										NULL);
+		oldslot = riinfo->slot_pk;
+	}
+
+	first_tuple = true;
+
 	/*
-	 * We have a plan now. Run it to check for existing references.
+	 * Retrieve and check the rows, one after another.
+	 *
+	 * One tuple should always be processed: if there's no "oldtable", valid
+	 * "oldslot" should have been passed.
 	 */
-	if (ri_PerformCheck(riinfo, &qkey, qplan,
-						fk_rel, pk_rel,
-						oldslot, NULL,
-						true,	/* must detect new rows */
-						SPI_OK_SELECT))
-		ri_ReportViolation(riinfo,
-						   pk_rel, fk_rel,
-						   oldslot,
-						   NULL,
-						   qkey.constr_queryno, false);
+	while ((oldtable && tuplestore_gettupleslot(oldtable, true, false, oldslot))
+		   || first_tuple)
+	{
+		/*
+		 * If another PK row now exists providing the old key values, we
+		 * should not do anything.  However, this check should only be made in
+		 * the NO ACTION case; in RESTRICT cases we don't wish to allow
+		 * another row to be substituted.
+		 */
+		if (is_no_action &&
+			ri_Check_Pk_Match(pk_rel, fk_rel, oldslot, riinfo))
+			continue;
+
+		if (ri_PerformCheck(riinfo, &qkey, qplan,
+							fk_rel, pk_rel,
+							oldslot,
+							true,	/* must detect new rows */
+							SPI_OK_SELECT))
+			ri_ReportViolation(riinfo,
+							   pk_rel, fk_rel,
+							   oldslot,
+							   NULL,
+							   qkey.constr_queryno, false);
+
+		if (first_tuple)
+		{
+			if (oldtable == NULL)
+				break;
+
+			first_tuple = false;
+		}
+	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
@@ -712,6 +813,44 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 	return PointerGetDatum(NULL);
 }
 
+/*
+ * Like ri_restrict_query(), but check a single row.
+ */
+static char *
+ri_restrict_query_single_row(const RI_ConstraintInfo *riinfo, Relation fk_rel,
+							 Relation pk_rel, Oid *paramtypes)
+{
+	StringInfo	querybuf = makeStringInfo();
+	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
+	const char *fk_only;
+
+	/* ----------
+	 * The query string built is
+	 *
+	 *	SELECT 1 FROM [ONLY] <fktable> x WHERE $1 = fkatt1 [AND ...]
+	 *		   FOR KEY SHARE OF x
+	 *
+	 * The type id's for the $ parameters are those of the
+	 * corresponding PK attributes.
+	 * ----------
+	 */
+	fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+		"" : "ONLY ";
+	quoteRelationName(fkrelname, fk_rel);
+	appendStringInfo(querybuf, "SELECT 1 FROM %s%s x WHERE ",
+					 fk_only, fkrelname);
+
+	ri_GenerateQual(querybuf, "AND", riinfo->nkeys,
+					NULL, pk_rel, riinfo->pk_attnums,
+					NULL, fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs,
+					GQ_PARAMS_LEFT,
+					paramtypes);
+
+	appendStringInfoString(querybuf, " FOR KEY SHARE OF x");
+
+	return querybuf->data;
+}
 
 /*
  * RI_FKey_cascade_del -
@@ -725,9 +864,10 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
-	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
+	Tuplestorestate *oldtable;
+	TupleTableSlot *oldslot;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -743,56 +883,46 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	 */
 	fk_rel = table_open(riinfo->fk_relid, RowExclusiveLock);
 	pk_rel = trigdata->tg_relation;
-	oldslot = trigdata->tg_trigslot;
 
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
 
-	/* Fetch or prepare a saved plan for the cascaded delete */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_DEL_DODELETE);
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *fk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	DELETE FROM [ONLY] <fktable> WHERE $1 = fkatt1 [AND ...]
-		 * The type id's for the $ parameters are those of the
-		 * corresponding PK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(fkrelname, fk_rel);
-
-		appendStringInfo(&querybuf, "DELETE FROM %s%s WHERE ", fk_only,
-						 fkrelname);
-		ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
-						NULL, pk_rel, riinfo->pk_attnums,
-						NULL, fk_rel, riinfo->fk_attnums,
-						riinfo->pf_eq_oprs,
-						GQ_PARAMS_LEFT,
-						queryoids);
+		Oid			paramtypes[RI_MAX_NUMKEYS];
+		char	   *query = ri_cascade_del_query_single_row(riinfo,
+															fk_rel,
+															pk_rel,
+															paramtypes);
 
 		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
+		qplan = ri_PlanCheck(query, riinfo->nkeys, paramtypes, &qkey,
+							 fk_rel, pk_rel);
 	}
 
-	/*
-	 * We have a plan now. Build up the arguments from the key values in the
-	 * deleted PK tuple and delete the referencing rows
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					oldslot, NULL,
-					true,		/* must detect new rows */
-					SPI_OK_DELETE);
+	oldtable = get_event_tuplestore(trigdata,
+									riinfo->nkeys,
+									riinfo->pk_attnums,
+									true,
+									riinfo->slot_pk->tts_tupleDescriptor,
+									NULL);
+
+	/* Retrieve and check the rows, one after another. */
+	oldslot = riinfo->slot_pk;
+	while (tuplestore_gettupleslot(oldtable, true, false, oldslot))
+	{
+		/*
+		 * We have a plan now. Build up the arguments from the key values in
+		 * the deleted PK tuple and delete the referencing rows
+		 */
+		ri_PerformCheck(riinfo, &qkey, qplan,
+						fk_rel, pk_rel,
+						oldslot,
+						true,	/* must detect new rows */
+						SPI_OK_DELETE);
+	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
@@ -802,6 +932,41 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	return PointerGetDatum(NULL);
 }
 
+static char *
+ri_cascade_del_query_single_row(const RI_ConstraintInfo *riinfo,
+								Relation fk_rel, Relation pk_rel,
+								Oid *paramtypes)
+{
+	StringInfo	querybuf = makeStringInfo();
+	const char *fk_only;
+	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
+
+	/* ----------
+	 * The query string built is
+	 *
+	 *	DELETE FROM [ONLY] <fktable> WHERE $1 = fkatt1 [AND ...]
+	 *
+	 * The type id's for the $ parameters are those of the
+	 * corresponding PK attributes.
+	 * ----------
+	 */
+
+	fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+		"" : "ONLY ";
+	quoteRelationName(fkrelname, fk_rel);
+
+	appendStringInfo(querybuf, "DELETE FROM %s%s WHERE ", fk_only,
+					 fkrelname);
+
+	ri_GenerateQual(querybuf, "AND", riinfo->nkeys,
+					NULL, pk_rel, riinfo->pk_attnums,
+					NULL, fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs,
+					GQ_PARAMS_LEFT,
+					paramtypes);
+
+	return querybuf->data;
+}
 
 /*
  * RI_FKey_cascade_upd -
@@ -815,10 +980,10 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
-	TupleTableSlot *newslot;
-	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
+	Tuplestorestate *newtable;
+	TupleTableSlot *slot;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -835,85 +1000,44 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	 */
 	fk_rel = table_open(riinfo->fk_relid, RowExclusiveLock);
 	pk_rel = trigdata->tg_relation;
-	newslot = trigdata->tg_newslot;
-	oldslot = trigdata->tg_trigslot;
 
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
 
 	/* Fetch or prepare a saved plan for the cascaded update */
 	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_UPD_DOUPDATE);
-
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		StringInfoData qualbuf;
-		char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-		char		attname[MAX_QUOTED_NAME_LEN];
-		char		paramname[16];
-		const char *querysep;
-		const char *qualsep;
-		Oid			queryoids[RI_MAX_NUMKEYS * 2];
-		const char *fk_only;
-
-		/* ----------
-		 * The query string built is
-		 *	UPDATE [ONLY] <fktable> SET fkatt1 = $1 [, ...]
-		 *			WHERE $n = fkatt1 [AND ...]
-		 * The type id's for the $ parameters are those of the
-		 * corresponding PK attributes.  Note that we are assuming
-		 * there is an assignment cast from the PK to the FK type;
-		 * else the parser will fail.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		initStringInfo(&qualbuf);
-		fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(fkrelname, fk_rel);
-		appendStringInfo(&querybuf, "UPDATE %s%s SET",
-						 fk_only, fkrelname);
-		querysep = "";
-		qualsep = "WHERE";
-		for (int i = 0, j = riinfo->nkeys; i < riinfo->nkeys; i++, j++)
-		{
-			Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
-			Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
-			Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
-
-			quoteOneName(attname,
-						 RIAttName(fk_rel, riinfo->fk_attnums[i]));
-			appendStringInfo(&querybuf,
-							 "%s %s = $%d",
-							 querysep, attname, i + 1);
-			sprintf(paramname, "$%d", j + 1);
-			ri_GenerateQualComponent(&qualbuf, qualsep,
-									 paramname, pk_type,
-									 riinfo->pf_eq_oprs[i],
-									 attname, fk_type);
-			if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
-				ri_GenerateQualCollation(&querybuf, pk_coll);
-			querysep = ",";
-			qualsep = "AND";
-			queryoids[i] = pk_type;
-			queryoids[j] = pk_type;
-		}
-		appendBinaryStringInfo(&querybuf, qualbuf.data, qualbuf.len);
+		Oid			paramtypes[RI_MAX_NUMKEYS * 2];
+		char	   *query = ri_cascade_upd_query_single_row(riinfo,
+															fk_rel,
+															pk_rel,
+															paramtypes);
 
 		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys * 2, queryoids,
-							 &qkey, fk_rel, pk_rel);
+		qplan = ri_PlanCheck(query, 2 * riinfo->nkeys, paramtypes, &qkey,
+							 fk_rel, pk_rel);
 	}
 
 	/*
-	 * We have a plan now. Run it to update the existing references.
+	 * In this case, both old and new values should be in the same tuplestore
+	 * because there's no useful join column.
 	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					oldslot, newslot,
-					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
+	newtable = get_event_tuplestore_for_cascade_update(trigdata, riinfo);
+
+	/* Retrieve and check the rows, one after another. */
+	slot = riinfo->slot_both;
+	while (tuplestore_gettupleslot(newtable, true, false, slot))
+	{
+		/*
+		 * We have a plan now. Run it to update the existing references.
+		 */
+		ri_PerformCheck(riinfo, &qkey, qplan,
+						fk_rel, pk_rel,
+						slot,
+						true,	/* must detect new rows */
+						SPI_OK_UPDATE);
+	}
 
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
@@ -923,6 +1047,69 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	return PointerGetDatum(NULL);
 }
 
+static char *
+ri_cascade_upd_query_single_row(const RI_ConstraintInfo *riinfo,
+								Relation fk_rel, Relation pk_rel,
+								Oid *paramtypes)
+{
+	StringInfo	querybuf = makeStringInfo();
+	StringInfo	qualbuf = makeStringInfo();
+	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
+	char		attname[MAX_QUOTED_NAME_LEN];
+	char		paramname[16];
+	const char *querysep;
+	const char *qualsep;
+	const char *fk_only;
+
+	/* ----------
+	 * The query string built is
+	 *
+	 *	UPDATE [ONLY] <fktable> SET fkatt1 = $1 [, ...]
+	 *			WHERE $n = fkatt1 [AND ...]
+	 *
+	 * The type id's for the $ parameters are those of the
+	 * corresponding PK attributes.  Note that we are assuming
+	 * there is an assignment cast from the PK to the FK type;
+	 * else the parser will fail.
+	 * ----------
+	 */
+	fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+		"" : "ONLY ";
+	quoteRelationName(fkrelname, fk_rel);
+	appendStringInfo(querybuf, "UPDATE %s%s SET",
+					 fk_only, fkrelname);
+	querysep = "";
+	qualsep = "WHERE";
+	for (int i = 0, j = riinfo->nkeys; i < riinfo->nkeys; i++, j++)
+	{
+		Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
+		Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+		Oid			pk_coll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
+		Oid			fk_coll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
+
+		quoteOneName(attname,
+					 RIAttName(fk_rel, riinfo->fk_attnums[i]));
+		appendStringInfo(querybuf,
+						 "%s %s = $%d",
+						 querysep, attname, i + 1);
+		sprintf(paramname, "$%d", j + 1);
+		ri_GenerateQualComponent(qualbuf, qualsep,
+								 paramname, pk_type,
+								 riinfo->pf_eq_oprs[i],
+								 attname, fk_type);
+
+		if (pk_coll != fk_coll && !get_collation_isdeterministic(pk_coll))
+			ri_GenerateQualCollation(querybuf, pk_coll);
+
+		querysep = ",";
+		qualsep = "AND";
+		paramtypes[i] = pk_type;
+		paramtypes[j] = pk_type;
+	}
+	appendBinaryStringInfo(querybuf, qualbuf->data, qualbuf->len);
+
+	return querybuf->data;
+}
 
 /*
  * RI_FKey_setnull_del -
@@ -996,9 +1183,10 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
-	TupleTableSlot *oldslot;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
+	Tuplestorestate *oldtable;
+	TupleTableSlot *oldslot;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -1011,7 +1199,6 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 	 */
 	fk_rel = table_open(riinfo->fk_relid, RowExclusiveLock);
 	pk_rel = trigdata->tg_relation;
-	oldslot = trigdata->tg_trigslot;
 
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
@@ -1024,90 +1211,110 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 					 (is_set_null
 					  ? RI_PLAN_SETNULL_DOUPDATE
 					  : RI_PLAN_SETDEFAULT_DOUPDATE));
-
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
-		StringInfoData querybuf;
-		char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-		Oid			queryoids[RI_MAX_NUMKEYS];
-		const char *fk_only;
+		Oid			paramtypes[RI_MAX_NUMKEYS];
+		char	   *query = ri_set_query_single_row(riinfo, fk_rel, pk_rel,
+													paramtypes, is_set_null);
 
-		/* ----------
-		 * The query string built is
-		 *	UPDATE [ONLY] <fktable> SET fkatt1 = {NULL|DEFAULT} [, ...]
-		 *			WHERE $1 = fkatt1 [AND ...]
-		 * The type id's for the $ parameters are those of the
-		 * corresponding PK attributes.
-		 * ----------
-		 */
-		initStringInfo(&querybuf);
-		fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-			"" : "ONLY ";
-		quoteRelationName(fkrelname, fk_rel);
-		appendStringInfo(&querybuf, "UPDATE %s%s SET",
-						 fk_only, fkrelname);
+		/* Prepare and save the plan */
+		qplan = ri_PlanCheck(query, riinfo->nkeys, paramtypes, &qkey,
+							 fk_rel, pk_rel);
+	}
 
-		for (int i = 0; i < riinfo->nkeys; i++)
-		{
-			char		attname[MAX_QUOTED_NAME_LEN];
-			const char *sep = i > 0 ? "," : "";
+	oldtable = get_event_tuplestore(trigdata,
+									riinfo->nkeys,
+									riinfo->pk_attnums,
+									true,
+									riinfo->slot_pk->tts_tupleDescriptor,
+									NULL);
 
-			quoteOneName(attname,
-						 RIAttName(fk_rel, riinfo->fk_attnums[i]));
+	/* The query needs parameters, so retrieve them now. */
+	oldslot = riinfo->slot_pk;
+	while (tuplestore_gettupleslot(oldtable, true, false, oldslot))
+	{
+		/*
+		 * We have a plan now. Run it to update the existing references.
+		 */
+		ri_PerformCheck(riinfo, &qkey, qplan,
+						fk_rel, pk_rel,
+						oldslot,
+						true,	/* must detect new rows */
+						SPI_OK_UPDATE);
 
-			appendStringInfo(&querybuf,
-							 "%s %s = %s",
-							 sep, attname,
-							 is_set_null ? "NULL" : "DEFAULT");
+		if (!is_set_null)
+		{
+			/*
+			 * If we just deleted or updated the PK row whose key was equal to
+			 * the FK columns' default values, and a referencing row exists in
+			 * the FK table, we would have updated that row to the same values
+			 * it already had --- and RI_FKey_fk_upd_check_required would
+			 * hence believe no check is necessary.  So we need to do another
+			 * lookup now and in case a reference still exists, abort the
+			 * operation.  That is already implemented in the NO ACTION
+			 * trigger, so just run it. (This recheck is only needed in the
+			 * SET DEFAULT case, since CASCADE would remove such rows in case
+			 * of a DELETE operation or would change the FK key values in case
+			 * of an UPDATE, while SET NULL is certain to result in rows that
+			 * satisfy the FK constraint.)
+			 */
+			ri_restrict(trigdata, true, oldslot);
 		}
-
-		appendStringInfo(&querybuf, " WHERE ");
-		ri_GenerateQual(&querybuf, "AND", riinfo->nkeys,
-						NULL, pk_rel, riinfo->pk_attnums,
-						NULL, fk_rel, riinfo->fk_attnums,
-						riinfo->pf_eq_oprs,
-						GQ_PARAMS_LEFT, queryoids);
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-							 &qkey, fk_rel, pk_rel);
 	}
 
-	/*
-	 * We have a plan now. Run it to update the existing references.
-	 */
-	ri_PerformCheck(riinfo, &qkey, qplan,
-					fk_rel, pk_rel,
-					oldslot, NULL,
-					true,		/* must detect new rows */
-					SPI_OK_UPDATE);
-
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
 
 	table_close(fk_rel, RowExclusiveLock);
 
-	if (is_set_null)
-		return PointerGetDatum(NULL);
-	else
+	return PointerGetDatum(NULL);
+}
+
+static char *
+ri_set_query_single_row(const RI_ConstraintInfo *riinfo, Relation fk_rel,
+						Relation pk_rel, Oid *paramtypes, bool is_set_null)
+{
+	StringInfo	querybuf = makeStringInfo();
+	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
+	const char *fk_only;
+
+	/* ----------
+	 * The query string built is
+	 *	UPDATE [ONLY] <fktable> SET fkatt1 = {NULL|DEFAULT} [, ...]
+	 *			WHERE $1 = fkatt1 [AND ...]
+	 * The type id's for the $ parameters are those of the
+	 * corresponding PK attributes.
+	 * ----------
+	 */
+	fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+		"" : "ONLY ";
+	quoteRelationName(fkrelname, fk_rel);
+	appendStringInfo(querybuf, "UPDATE %s%s SET",
+					 fk_only, fkrelname);
+
+	for (int i = 0; i < riinfo->nkeys; i++)
 	{
-		/*
-		 * If we just deleted or updated the PK row whose key was equal to the
-		 * FK columns' default values, and a referencing row exists in the FK
-		 * table, we would have updated that row to the same values it already
-		 * had --- and RI_FKey_fk_upd_check_required would hence believe no
-		 * check is necessary.  So we need to do another lookup now and in
-		 * case a reference still exists, abort the operation.  That is
-		 * already implemented in the NO ACTION trigger, so just run it. (This
-		 * recheck is only needed in the SET DEFAULT case, since CASCADE would
-		 * remove such rows in case of a DELETE operation or would change the
-		 * FK key values in case of an UPDATE, while SET NULL is certain to
-		 * result in rows that satisfy the FK constraint.)
-		 */
-		return ri_restrict(trigdata, true);
+		char		attname[MAX_QUOTED_NAME_LEN];
+		const char *sep = i > 0 ? "," : "";
+
+		quoteOneName(attname,
+					 RIAttName(fk_rel, riinfo->fk_attnums[i]));
+
+		appendStringInfo(querybuf,
+						 "%s %s = %s",
+						 sep, attname,
+						 is_set_null ? "NULL" : "DEFAULT");
 	}
-}
 
+	appendStringInfo(querybuf, " WHERE ");
+	ri_GenerateQual(querybuf, "AND", riinfo->nkeys,
+					NULL, pk_rel, riinfo->pk_attnums,
+					NULL, fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs,
+					GQ_PARAMS_LEFT, paramtypes);
+
+	return querybuf->data;
+}
 
 /*
  * RI_FKey_pk_upd_check_required -
@@ -1132,7 +1339,8 @@ RI_FKey_pk_upd_check_required(Trigger *trigger, Relation pk_rel,
 	 * If any old key value is NULL, the row could not have been referenced by
 	 * an FK row, so no check is needed.
 	 */
-	if (ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) != RI_KEYS_NONE_NULL)
+	if (ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true,
+					 false) != RI_KEYS_NONE_NULL)
 		return false;
 
 	/* If all old and new key values are equal, no check is needed */
@@ -1164,7 +1372,8 @@ RI_FKey_fk_upd_check_required(Trigger *trigger, Relation fk_rel,
 
 	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
-	ri_nullcheck = ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false);
+	ri_nullcheck = ri_NullCheck(RelationGetDescr(fk_rel), newslot, riinfo, false,
+								false);
 
 	/*
 	 * If all new key values are NULL, the row satisfies the constraint, so no
@@ -1236,6 +1445,24 @@ RI_FKey_fk_upd_check_required(Trigger *trigger, Relation fk_rel,
 	return true;
 }
 
+/*
+ * RI_FKey_fk_attributes -
+ *
+ * Return tuple descriptor containing the FK attributes of given FK constraint
+ * and only those. In addition, array containing the numbers of the key
+ * attributes within the whole table is stored to *attnums_p.
+ */
+TupleDesc
+RI_FKey_fk_attributes(Trigger *trigger, Relation trig_rel, const int16 **attnums_p)
+{
+	const RI_ConstraintInfo *riinfo;
+
+	riinfo = ri_FetchConstraintInfo(trigger, trig_rel, false);
+	*attnums_p = riinfo->fk_attnums;
+
+	return riinfo->slot_fk->tts_tupleDescriptor;
+}
+
 /*
  * RI_Initial_Check -
  *
@@ -1471,7 +1698,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 		 * disallows partially-null FK rows.
 		 */
 		if (fake_riinfo.confmatchtype == FKCONSTR_MATCH_FULL &&
-			ri_NullCheck(tupdesc, slot, &fake_riinfo, false) != RI_KEYS_NONE_NULL)
+			ri_NullCheck(tupdesc, slot, &fake_riinfo, false, false) != RI_KEYS_NONE_NULL)
 			ereport(ERROR,
 					(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
 					 errmsg("insert or update on table \"%s\" violates foreign key constraint \"%s\"",
@@ -1974,7 +2201,7 @@ ri_FetchConstraintInfo(Trigger *trigger, Relation trig_rel, bool rel_is_pk)
 				 errhint("Remove this referential integrity trigger and its mates, then do ALTER TABLE ADD CONSTRAINT.")));
 
 	/* Find or create a hashtable entry for the constraint */
-	riinfo = ri_LoadConstraintInfo(constraintOid);
+	riinfo = ri_LoadConstraintInfo(constraintOid, trig_rel, rel_is_pk);
 
 	/* Do some easy cross-checks against the trigger call data */
 	if (rel_is_pk)
@@ -2010,12 +2237,16 @@ ri_FetchConstraintInfo(Trigger *trigger, Relation trig_rel, bool rel_is_pk)
  * Fetch or create the RI_ConstraintInfo struct for an FK constraint.
  */
 static const RI_ConstraintInfo *
-ri_LoadConstraintInfo(Oid constraintOid)
+ri_LoadConstraintInfo(Oid constraintOid, Relation trig_rel, bool rel_is_pk)
 {
 	RI_ConstraintInfo *riinfo;
 	bool		found;
 	HeapTuple	tup;
 	Form_pg_constraint conForm;
+	MemoryContext oldcxt;
+	TupleDesc	tupdesc;
+	Relation	pk_rel,
+				fk_rel;
 
 	/*
 	 * On the first call initialize the hashtable
@@ -2030,7 +2261,12 @@ ri_LoadConstraintInfo(Oid constraintOid)
 											   (void *) &constraintOid,
 											   HASH_ENTER, &found);
 	if (!found)
+	{
 		riinfo->valid = false;
+		riinfo->slot_mcxt = AllocSetContextCreate(TopMemoryContext,
+												  "RI_ConstraintInfoSlots",
+												  ALLOCSET_SMALL_SIZES);
+	}
 	else if (riinfo->valid)
 		return riinfo;
 
@@ -2067,6 +2303,60 @@ ri_LoadConstraintInfo(Oid constraintOid)
 
 	ReleaseSysCache(tup);
 
+	/*
+	 * Construct auxiliary tuple descriptors containing only the key
+	 * attributes.
+	 */
+	if (rel_is_pk)
+	{
+		pk_rel = trig_rel;
+		fk_rel = table_open(riinfo->fk_relid, AccessShareLock);
+	}
+	else
+	{
+		pk_rel = table_open(riinfo->pk_relid, AccessShareLock);
+		fk_rel = trig_rel;
+	}
+
+	/*
+	 * Use a separate memory context for the slots so that memory does not
+	 * leak if the riinfo needs to be reloaded.
+	 */
+	MemoryContextReset(riinfo->slot_mcxt);
+	oldcxt = MemoryContextSwitchTo(riinfo->slot_mcxt);
+
+	/* The PK attributes. */
+	tupdesc = CreateTemplateTupleDesc(riinfo->nkeys);
+	add_key_attrs_to_tupdesc(tupdesc, pk_rel, riinfo, riinfo->pk_attnums, 1,
+							 false);
+	riinfo->slot_pk = MakeSingleTupleTableSlot(tupdesc, &TTSOpsMinimalTuple);
+
+	/* The FK attributes. */
+	tupdesc = CreateTemplateTupleDesc(riinfo->nkeys);
+	add_key_attrs_to_tupdesc(tupdesc, fk_rel, riinfo, riinfo->fk_attnums, 1,
+							 false);
+	riinfo->slot_fk = MakeSingleTupleTableSlot(tupdesc, &TTSOpsMinimalTuple);
+
+	/*
+	 * The descriptor to store both NEW and OLD tuple into when processing ON
+	 * UPDATE CASCADE.
+	 */
+	tupdesc = CreateTemplateTupleDesc(2 * riinfo->nkeys);
+	/* Add the key attributes for both NEW and OLD. */
+	add_key_attrs_to_tupdesc(tupdesc, pk_rel, riinfo, riinfo->pk_attnums, 1,
+							 true);
+	add_key_attrs_to_tupdesc(tupdesc, pk_rel, riinfo, riinfo->pk_attnums,
+							 riinfo->nkeys + 1, true);
+	riinfo->slot_both = MakeSingleTupleTableSlot(tupdesc,
+												 &TTSOpsMinimalTuple);
+
+	MemoryContextSwitchTo(oldcxt);
+
+	if (rel_is_pk)
+		table_close(fk_rel, AccessShareLock);
+	else
+		table_close(pk_rel, AccessShareLock);
+
 	/*
 	 * For efficient processing of invalidation messages below, we keep a
 	 * doubly-linked list, and a count, of all currently valid entries.
@@ -2133,7 +2423,8 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, uint32 hashvalue)
  */
 static SPIPlanPtr
 ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
-			 RI_QueryKey *qkey, Relation fk_rel, Relation pk_rel)
+			 RI_QueryKey *qkey, Relation fk_rel,
+			 Relation pk_rel)
 {
 	SPIPlanPtr	qplan;
 	Relation	query_rel;
@@ -2156,7 +2447,7 @@ ri_PlanCheck(const char *querystr, int nargs, Oid *argtypes,
 						   SECURITY_NOFORCE_RLS);
 
 	/* Create the plan */
-	qplan = SPI_prepare(querystr, nargs, argtypes);
+	qplan = SPI_prepare(querystr, nargs, nargs > 0 ? argtypes : NULL);
 
 	if (qplan == NULL)
 		elog(ERROR, "SPI_prepare returned %s for %s", SPI_result_code_string(SPI_result), querystr);
@@ -2178,20 +2469,20 @@ static bool
 ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 				RI_QueryKey *qkey, SPIPlanPtr qplan,
 				Relation fk_rel, Relation pk_rel,
-				TupleTableSlot *oldslot, TupleTableSlot *newslot,
+				TupleTableSlot *slot,
 				bool detectNewRows, int expect_OK)
 {
-	Relation	query_rel,
-				source_rel;
-	bool		source_is_pk;
+	Relation	query_rel;
 	Snapshot	test_snapshot;
 	Snapshot	crosscheck_snapshot;
 	int			limit;
 	int			spi_result;
 	Oid			save_userid;
 	int			save_sec_context;
-	Datum		vals[RI_MAX_NUMKEYS * 2];
-	char		nulls[RI_MAX_NUMKEYS * 2];
+	Datum		vals_loc[RI_MAX_NUMKEYS * 2];
+	char		nulls_loc[RI_MAX_NUMKEYS * 2];
+	Datum	   *vals = NULL;
+	char	   *nulls = NULL;
 
 	/*
 	 * Use the query type code to determine whether the query is run against
@@ -2202,37 +2493,26 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 	else
 		query_rel = fk_rel;
 
-	/*
-	 * The values for the query are taken from the table on which the trigger
-	 * is called - it is normally the other one with respect to query_rel. An
-	 * exception is ri_Check_Pk_Match(), which uses the PK table for both (and
-	 * sets queryno to RI_PLAN_CHECK_LOOKUPPK_FROM_PK).  We might eventually
-	 * need some less klugy way to determine this.
-	 */
-	if (qkey->constr_queryno == RI_PLAN_CHECK_LOOKUPPK)
+	if (slot)
 	{
-		source_rel = fk_rel;
-		source_is_pk = false;
-	}
-	else
-	{
-		source_rel = pk_rel;
-		source_is_pk = true;
-	}
+		int			nparams = riinfo->nkeys;
 
-	/* Extract the parameters to be passed into the query */
-	if (newslot)
-	{
-		ri_ExtractValues(source_rel, newslot, riinfo, source_is_pk,
-						 vals, nulls);
-		if (oldslot)
-			ri_ExtractValues(source_rel, oldslot, riinfo, source_is_pk,
-							 vals + riinfo->nkeys, nulls + riinfo->nkeys);
-	}
-	else
-	{
-		ri_ExtractValues(source_rel, oldslot, riinfo, source_is_pk,
-						 vals, nulls);
+		vals = vals_loc;
+		nulls = nulls_loc;
+
+		/* Extract the parameters to be passed into the query */
+		ri_ExtractValues(slot, 0, nparams, vals, nulls);
+
+		if (slot->tts_tupleDescriptor->natts != nparams)
+		{
+			/*
+			 * In a special case (ON UPDATE CASCADE) the slot may contain both
+			 * new and old values of the key.
+			 */
+			Assert(slot->tts_tupleDescriptor->natts == nparams * 2);
+
+			ri_ExtractValues(slot, nparams, nparams, vals, nulls);
+		}
 	}
 
 	/*
@@ -2295,28 +2575,21 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
 						RelationGetRelationName(fk_rel)),
 				 errhint("This is most likely due to a rule having rewritten the query.")));
 
-	return SPI_processed != 0;
+	return SPI_processed > 0;
 }
 
 /*
  * Extract fields from a tuple into Datum/nulls arrays
  */
 static void
-ri_ExtractValues(Relation rel, TupleTableSlot *slot,
-				 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
-				 Datum *vals, char *nulls)
+ri_ExtractValues(TupleTableSlot *slot, int first, int nkeys, Datum *vals,
+				 char *nulls)
 {
-	const int16 *attnums;
 	bool		isnull;
 
-	if (rel_is_pk)
-		attnums = riinfo->pk_attnums;
-	else
-		attnums = riinfo->fk_attnums;
-
-	for (int i = 0; i < riinfo->nkeys; i++)
+	for (int i = first; i < first + nkeys; i++)
 	{
-		vals[i] = slot_getattr(slot, attnums[i], &isnull);
+		vals[i] = slot_getattr(slot, i + 1, &isnull);
 		nulls[i] = isnull ? 'n' : ' ';
 	}
 }
@@ -2345,25 +2618,27 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 	bool		has_perm = true;
 
 	/*
-	 * Determine which relation to complain about.  If tupdesc wasn't passed
-	 * by caller, assume the violator tuple came from there.
+	 * Determine which relation to complain about.
 	 */
-	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK);
+	onfk = queryno == RI_PLAN_CHECK_LOOKUPPK;
 	if (onfk)
 	{
 		attnums = riinfo->fk_attnums;
 		rel_oid = fk_rel->rd_id;
-		if (tupdesc == NULL)
-			tupdesc = fk_rel->rd_att;
 	}
 	else
 	{
 		attnums = riinfo->pk_attnums;
 		rel_oid = pk_rel->rd_id;
-		if (tupdesc == NULL)
-			tupdesc = pk_rel->rd_att;
 	}
 
+	/*
+	 * If tupdesc wasn't passed by caller, assume the violator tuple matches
+	 * the descriptor of the violatorslot.
+	 */
+	if (tupdesc == NULL)
+		tupdesc = violatorslot->tts_tupleDescriptor;
+
 	/*
 	 * Check permissions- if the user does not have access to view the data in
 	 * any of the key columns then we don't include the errdetail() below.
@@ -2410,8 +2685,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 		initStringInfo(&key_values);
 		for (int idx = 0; idx < riinfo->nkeys; idx++)
 		{
-			int			fnum = attnums[idx];
-			Form_pg_attribute att = TupleDescAttr(tupdesc, fnum - 1);
+			Form_pg_attribute att = TupleDescAttr(tupdesc, idx);
 			char	   *name,
 					   *val;
 			Datum		datum;
@@ -2419,7 +2693,7 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 
 			name = NameStr(att->attname);
 
-			datum = slot_getattr(violatorslot, fnum, &isnull);
+			datum = slot_getattr(violatorslot, idx + 1, &isnull);
 			if (!isnull)
 			{
 				Oid			foutoid;
@@ -2487,24 +2761,34 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
  * Determine the NULL state of all key values in a tuple
  *
  * Returns one of RI_KEYS_ALL_NULL, RI_KEYS_NONE_NULL or RI_KEYS_SOME_NULL.
+ *
+ * If the slot only contains key columns, pass ignore_attnums=true.
  */
 static int
 ri_NullCheck(TupleDesc tupDesc,
 			 TupleTableSlot *slot,
-			 const RI_ConstraintInfo *riinfo, bool rel_is_pk)
+			 const RI_ConstraintInfo *riinfo, bool rel_is_pk,
+			 bool ignore_attnums)
 {
 	const int16 *attnums;
 	bool		allnull = true;
 	bool		nonenull = true;
 
-	if (rel_is_pk)
-		attnums = riinfo->pk_attnums;
-	else
-		attnums = riinfo->fk_attnums;
+	if (!ignore_attnums)
+	{
+		if (rel_is_pk)
+			attnums = riinfo->pk_attnums;
+		else
+			attnums = riinfo->fk_attnums;
+	}
 
 	for (int i = 0; i < riinfo->nkeys; i++)
 	{
-		if (slot_attisnull(slot, attnums[i]))
+		int16		attnum;
+
+		attnum = !ignore_attnums ? attnums[i] : i + 1;
+
+		if (slot_attisnull(slot, attnum))
 			nonenull = false;
 		else
 			allnull = false;
@@ -2888,3 +3172,272 @@ RI_FKey_trigger_type(Oid tgfoid)
 
 	return RI_TRIGGER_NONE;
 }
+
+/*
+ * Turn TID array into a tuplestore. If snapshot is passed, only use tuples
+ * visible by this snapshot.
+ */
+static Tuplestorestate *
+get_event_tuplestore(TriggerData *trigdata, int nkeys, const int16 *attnums,
+					 bool old, TupleDesc tupdesc, Snapshot snapshot)
+{
+	ResourceOwner saveResourceOwner;
+	Tuplestorestate *result;
+	TIDArray   *ta;
+	ItemPointer it;
+	TupleTableSlot *slot;
+	int			i;
+	Datum		values[RI_MAX_NUMKEYS];
+	bool		isnull[RI_MAX_NUMKEYS];
+
+	saveResourceOwner = CurrentResourceOwner;
+	CurrentResourceOwner = CurTransactionResourceOwner;
+	result = tuplestore_begin_heap(false, false, work_mem);
+	CurrentResourceOwner = saveResourceOwner;
+
+	/* XXX Shouldn't tg_trigslot and tg_newslot be the same? */
+	if (old)
+	{
+		ta = trigdata->ri_tids_old;
+		slot = trigdata->tg_trigslot;
+	}
+	else
+	{
+		ta = trigdata->ri_tids_new;
+		slot = trigdata->tg_newslot;
+	}
+
+	it = ta->tids;
+	for (i = 0; i < ta->n; i++)
+	{
+		int			j;
+
+		CHECK_FOR_INTERRUPTS();
+
+		ExecClearTuple(slot);
+
+		if (!table_tuple_fetch_row_version(trigdata->tg_relation, it,
+										   SnapshotAny, slot))
+		{
+			const char *tuple_kind = old ? "tuple1" : "tuple2";
+
+			elog(ERROR, "failed to fetch %s for AFTER trigger", tuple_kind);
+		}
+
+		if (snapshot)
+		{
+			/*
+			 * We should not even consider checking the row if it is no longer
+			 * valid, since it was either deleted (so the deferred check
+			 * should be skipped) or updated (in which case only the latest
+			 * version of the row should be checked).  Test its liveness
+			 * according to SnapshotSelf. We need pin and lock on the buffer
+			 * to call HeapTupleSatisfiesVisibility.  Caller should be holding
+			 * pin, but not lock.
+			 */
+			if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, slot,
+												snapshot))
+				continue;
+
+			/*
+			 * In fact the snapshot is passed iff the slot contains a tuple of
+			 * the FK table being inserted / updated, so perform one more
+			 * related in this branch while we have the tuple in the slot. If
+			 * we tested this later, we might need to remove the tuple later,
+			 * however tuplestore.c does not support such an operation.
+			 */
+			if (!RI_FKey_check_query_required(trigdata->tg_trigger,
+											  trigdata->tg_relation, slot))
+				continue;
+		}
+
+		/*
+		 * Only store the key attributes.
+		 */
+		for (j = 0; j < nkeys; j++)
+			values[j] = slot_getattr(slot, attnums[j], &isnull[j]);
+
+		tuplestore_putvalues(result, tupdesc, values, isnull);
+		it++;
+	}
+
+	return result;
+}
+
+/*
+ * Like get_event_tuplestore(), but put both old and new key values into the
+ * same tuple. If the query (see RI_FKey_cascade_upd) used two tuplestores, it
+ * whould have to join them somehow, but there's not suitable join column.
+ */
+static Tuplestorestate *
+get_event_tuplestore_for_cascade_update(TriggerData *trigdata,
+										const RI_ConstraintInfo *riinfo)
+{
+	ResourceOwner saveResourceOwner;
+	Tuplestorestate *result;
+	TIDArray   *ta_old,
+			   *ta_new;
+	ItemPointer it_old,
+				it_new;
+	TupleTableSlot *slot_old,
+			   *slot_new;
+	int			i;
+	Datum	   *values,
+			   *key_values;
+	bool	   *nulls,
+			   *key_nulls;
+	MemoryContext tuple_context;
+	Relation	rel = trigdata->tg_relation;
+	TupleDesc	desc_rel = RelationGetDescr(rel);
+
+	saveResourceOwner = CurrentResourceOwner;
+	CurrentResourceOwner = CurTransactionResourceOwner;
+	result = tuplestore_begin_heap(false, false, work_mem);
+	CurrentResourceOwner = saveResourceOwner;
+
+	/*
+	 * This context will be used for the contents of "values".
+	 *
+	 * CurrentMemoryContext should be the "batch context", as passed to
+	 * AfterTriggerExecuteRI().
+	 */
+	tuple_context =
+		AllocSetContextCreate(CurrentMemoryContext,
+							  "AfterTriggerCascadeUpdateContext",
+							  ALLOCSET_DEFAULT_SIZES);
+
+	ta_old = trigdata->ri_tids_old;
+	ta_new = trigdata->ri_tids_new;
+	Assert(ta_old->n == ta_new->n);
+
+	slot_old = trigdata->tg_trigslot;
+	slot_new = trigdata->tg_newslot;
+
+	key_values = (Datum *) palloc(riinfo->nkeys * 2 * sizeof(Datum));
+	key_nulls = (bool *) palloc(riinfo->nkeys * 2 * sizeof(bool));
+	values = (Datum *) palloc(desc_rel->natts * sizeof(Datum));
+	nulls = (bool *) palloc(desc_rel->natts * sizeof(bool));
+
+	it_old = ta_old->tids;
+	it_new = ta_new->tids;
+	for (i = 0; i < ta_old->n; i++)
+	{
+		MemoryContext oldcxt;
+
+		MemoryContextReset(tuple_context);
+		oldcxt = MemoryContextSwitchTo(tuple_context);
+
+		/*
+		 * Add the new values, followed by the old ones. This order is
+		 * expected to satisfy the parameters of the query generated in
+		 * ri_cascade_upd_query_single_row().
+		 */
+		add_key_values(slot_new, riinfo, trigdata->tg_relation, it_new,
+					   key_values, key_nulls, values, nulls, 0);
+		add_key_values(slot_old, riinfo, trigdata->tg_relation, it_old,
+					   key_values, key_nulls, values, nulls, riinfo->nkeys);
+		MemoryContextSwitchTo(oldcxt);
+
+		tuplestore_putvalues(result, riinfo->slot_both->tts_tupleDescriptor,
+							 key_values, key_nulls);
+
+		it_old++;
+		it_new++;
+	}
+	MemoryContextDelete(tuple_context);
+
+	return result;
+}
+
+/*
+ * Add key attributes "attnums" of relation "rel" to "tupdesc", starting at
+ * position "first".
+ */
+static void
+add_key_attrs_to_tupdesc(TupleDesc tupdesc, Relation rel,
+						 const RI_ConstraintInfo *riinfo, int16 *attnums,
+						 int first, bool generate_attnames)
+{
+	int			i;
+
+	for (i = 0; i < riinfo->nkeys; i++)
+	{
+		Oid			atttypid;
+		const char *attname;
+		char		attname_loc[NAMEDATALEN];
+		Form_pg_attribute att;
+
+		atttypid = RIAttType(rel, attnums[i]);
+
+		if (!generate_attnames)
+			attname = RIAttName(rel, attnums[i]);
+		else
+		{
+			const char *kind;
+
+			/*
+			 * Tne NEW/OLD order does not matter for bulk update, but the
+			 * tuple must start with the NEW values so that it fits the query
+			 * to check a single row when processing ON UPDATE CASCADE --- see
+			 * ri_cascade_upd_query_single_row().
+			 */
+			kind = first == 1 ? "new" : "old";
+
+			/*
+			 * Generate unique names instead of e.g. using prefix to
+			 * distinguish the old values from new ones. The prefix might be a
+			 * problem due to the limited attribute name length.
+			 */
+			snprintf(attname_loc, NAMEDATALEN, "pkatt%d_%s", i + 1, kind);
+			attname = attname_loc;
+		}
+
+		att = tupdesc->attrs;
+		TupleDescInitEntry(tupdesc, first + i, attname, atttypid,
+						   att->atttypmod, att->attndims);
+		att++;
+	}
+}
+
+/*
+ * Retrieve tuple using given slot, deform it and add the attribute values to
+ * "key_values" and "key_null" arrays. "values" and "nulls" is a workspace to
+ * deform the tuple into. "first" tells where in the output array we should
+ * start.
+ */
+static void
+add_key_values(TupleTableSlot *slot, const RI_ConstraintInfo *riinfo,
+			   Relation rel, ItemPointer ip,
+			   Datum *key_values, bool *key_nulls,
+			   Datum *values, bool *nulls, int first)
+{
+	HeapTuple	tuple;
+	bool		shouldfree;
+	int			i,
+				c;
+
+	ExecClearTuple(slot);
+	if (!table_tuple_fetch_row_version(rel, ip, SnapshotAny, slot))
+	{
+		const char *tuple_kind = first == 0 ? "tuple1" : "tuple2";
+
+		elog(ERROR, "failed to fetch %s for AFTER trigger", tuple_kind);
+	}
+	tuple = ExecFetchSlotHeapTuple(slot, false, &shouldfree);
+
+	heap_deform_tuple(tuple, slot->tts_tupleDescriptor, values, nulls);
+
+	/* Pick the key values and store them in the output arrays. */
+	c = first;
+	for (i = 0; i < riinfo->nkeys; i++)
+	{
+		int16		attnum = riinfo->pk_attnums[i];
+
+		key_values[c] = values[attnum - 1];
+		key_nulls[c] = nulls[attnum - 1];
+		c++;
+	}
+
+	if (shouldfree)
+		pfree(tuple);
+}
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index a40ddf5db5..71566a3b7c 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -27,19 +27,42 @@
 
 typedef uint32 TriggerEvent;
 
+/*
+ * An intermediate storage for TIDs, in order to process multiple events by a
+ * single call of RI trigger.
+ *
+ * XXX Introduce a size limit and make caller of add_tid() aware of it?
+ */
+typedef struct TIDArray
+{
+	ItemPointerData *tids;
+	uint64		n;
+	uint64		nmax;
+} TIDArray;
+
 typedef struct TriggerData
 {
 	NodeTag		type;
+	int			tgindx;
 	TriggerEvent tg_event;
 	Relation	tg_relation;
 	HeapTuple	tg_trigtuple;
 	HeapTuple	tg_newtuple;
 	Trigger    *tg_trigger;
+	bool		is_ri_trigger;
 	TupleTableSlot *tg_trigslot;
 	TupleTableSlot *tg_newslot;
 	Tuplestorestate *tg_oldtable;
 	Tuplestorestate *tg_newtable;
 	const Bitmapset *tg_updatedcols;
+
+	TupleDesc	desc;
+
+	/*
+	 * RI triggers receive TIDs and retrieve the tuples before they're needed.
+	 */
+	TIDArray   *ri_tids_old;
+	TIDArray   *ri_tids_new;
 } TriggerData;
 
 /*
@@ -262,6 +285,8 @@ extern bool RI_FKey_pk_upd_check_required(Trigger *trigger, Relation pk_rel,
 										  TupleTableSlot *old_slot, TupleTableSlot *new_slot);
 extern bool RI_FKey_fk_upd_check_required(Trigger *trigger, Relation fk_rel,
 										  TupleTableSlot *old_slot, TupleTableSlot *new_slot);
+extern TupleDesc RI_FKey_fk_attributes(Trigger *trigger, Relation trig_rel,
+									   const int16 **attnums_p);
 extern bool RI_Initial_Check(Trigger *trigger,
 							 Relation fk_rel, Relation pk_rel);
 extern void RI_PartitionRemove_Check(Trigger *trigger, Relation fk_rel,
-- 
2.20.1

v02-0005-Process-multiple-RI-trigger-events-at-a-time.patchtext/x-diffDownload
From fbf061a8706975ac4dbca77f6178b8523a4c117c Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Fri, 5 Jun 2020 16:42:34 +0200
Subject: [PATCH 5/5] Process multiple RI trigger events at a time.

It should be more efficient to execute the check query once per multiple rows
inserted / updated / deleted than to run the query for every single row again.

If the user query only affects a single row, the simple query is still used to
check the RI, as it was before this patch.
---
 src/backend/utils/adt/ri_triggers.c | 995 ++++++++++++++++++++++------
 1 file changed, 789 insertions(+), 206 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 44d1e12a81..b07a4de909 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -69,15 +69,18 @@
 
 /* RI query type codes */
 /* these queries are executed against the PK (referenced) table: */
-#define RI_PLAN_CHECK_LOOKUPPK			1
-#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	2
+#define RI_PLAN_CHECK_LOOKUPPK_SINGLE	1	/* check single row  */
+#define RI_PLAN_CHECK_LOOKUPPK_INS		2
+#define RI_PLAN_CHECK_LOOKUPPK_UPD		3
+#define RI_PLAN_CHECK_LOOKUPPK_FROM_PK	4
 #define RI_PLAN_LAST_ON_PK				RI_PLAN_CHECK_LOOKUPPK_FROM_PK
 /* these queries are executed against the FK (referencing) table: */
-#define RI_PLAN_CASCADE_DEL_DODELETE	3
-#define RI_PLAN_CASCADE_UPD_DOUPDATE	4
-#define RI_PLAN_RESTRICT_CHECKREF		5
-#define RI_PLAN_SETNULL_DOUPDATE		6
-#define RI_PLAN_SETDEFAULT_DOUPDATE		7
+#define RI_PLAN_CASCADE_DEL_DODELETE	5
+#define RI_PLAN_CASCADE_UPD_DOUPDATE	6
+#define RI_PLAN_RESTRICT_CHECKREF		7
+#define RI_PLAN_RESTRICT_CHECKREF_NO_ACTION		8
+#define RI_PLAN_SETNULL_DOUPDATE		9
+#define RI_PLAN_SETDEFAULT_DOUPDATE		10
 
 #define MAX_QUOTED_NAME_LEN  (NAMEDATALEN*2+3)
 #define MAX_QUOTED_REL_NAME_LEN  (MAX_QUOTED_NAME_LEN*2)
@@ -130,6 +133,7 @@ typedef struct RI_QueryKey
 {
 	Oid			constr_id;		/* OID of pg_constraint entry */
 	int32		constr_queryno; /* query type ID, see RI_PLAN_XXX above */
+	bool		single_row;		/* Checking a single row? */
 } RI_QueryKey;
 
 /*
@@ -177,6 +181,9 @@ static int	ri_constraint_cache_valid_count = 0;
 /*
  * Local function prototypes
  */
+static char *RI_FKey_check_query(const RI_ConstraintInfo *riinfo,
+								 Relation fk_rel, Relation pk_rel,
+								 bool insert);
 static char *RI_FKey_check_query_single_row(const RI_ConstraintInfo *riinfo,
 											Relation fk_rel, Relation pk_rel,
 											Oid *paramtypes);
@@ -185,18 +192,26 @@ static bool RI_FKey_check_query_required(Trigger *trigger, Relation fk_rel,
 static bool ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 							  TupleTableSlot *oldslot,
 							  const RI_ConstraintInfo *riinfo);
-static Datum ri_restrict(TriggerData *trigdata, bool is_no_action,
-						 TupleTableSlot *oldslot);
+static Datum ri_restrict(TriggerData *trigdata, bool is_no_action);
+static char *ri_restrict_query(const RI_ConstraintInfo *riinfo,
+							   Relation fk_rel, Relation pk_rel,
+							   bool no_action);
 static char *ri_restrict_query_single_row(const RI_ConstraintInfo *riinfo,
 										  Relation fk_rel,
 										  Relation pk_rel, Oid *paramtypes);
+static char *ri_cascade_del_query(const RI_ConstraintInfo *riinfo,
+								  Relation fk_rel, Relation pk_rel);
 static char *ri_cascade_del_query_single_row(const RI_ConstraintInfo *riinfo,
 											 Relation fk_rel, Relation pk_rel,
 											 Oid *paramtypes);
+static char *ri_cascade_upd_query(const RI_ConstraintInfo *riinfo,
+								  Relation fk_rel, Relation pk_rel);
 static char *ri_cascade_upd_query_single_row(const RI_ConstraintInfo *riinfo,
 											 Relation fk_rel, Relation pk_rel,
 											 Oid *paramtypes);
 static Datum ri_set(TriggerData *trigdata, bool is_set_null);
+static char *ri_set_query(const RI_ConstraintInfo *riinfo, Relation fk_rel,
+						  Relation pk_rel, bool is_set_null);
 static char *ri_set_query_single_row(const RI_ConstraintInfo *riinfo,
 									 Relation fk_rel, Relation pk_rel,
 									 Oid *paramtypes, bool is_set_null);
@@ -223,6 +238,9 @@ static void ri_GenerateQual(StringInfo buf, char *sep, int nkeys,
 							const int16 *rattnums, const Oid *eq_oprs,
 							GenQualParams params, Oid *paramtypes);
 
+static void ri_GenerateKeyList(StringInfo buf, int nkeys,
+							   const char *tabname, Relation rel,
+							   const int16 *attnums);
 static void ri_GenerateQualComponent(StringInfo buf,
 									 const char *sep,
 									 const char *leftop, Oid leftoptype,
@@ -234,7 +252,8 @@ static int	ri_NullCheck(TupleDesc tupdesc, TupleTableSlot *slot,
 						 bool ignore_attnums);
 static void ri_BuildQueryKey(RI_QueryKey *key,
 							 const RI_ConstraintInfo *riinfo,
-							 int32 constr_queryno);
+							 int32 constr_queryno,
+							 bool single_row);
 static bool ri_KeysEqual(Relation rel, TupleTableSlot *oldslot, TupleTableSlot *newslot,
 						 const RI_ConstraintInfo *riinfo, bool rel_is_pk);
 static bool ri_AttributesEqual(Oid eq_opr, Oid typeid,
@@ -267,6 +286,10 @@ static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 							   Relation pk_rel, Relation fk_rel,
 							   TupleTableSlot *violatorslot, TupleDesc tupdesc,
 							   int queryno, bool partgone) pg_attribute_noreturn();
+static int	ri_register_trigger_data(TriggerData *tdata,
+									 Tuplestorestate *oldtable,
+									 Tuplestorestate *newtable,
+									 TupleDesc desc);
 static Tuplestorestate *get_event_tuplestore(TriggerData *trigdata, int nkeys,
 											 const int16 *attnums, bool old,
 											 TupleDesc tupdesc, Snapshot snapshot);
@@ -280,6 +303,8 @@ static void add_key_values(TupleTableSlot *slot,
 						   Relation rel, ItemPointer ip,
 						   Datum *key_values, bool *key_nulls,
 						   Datum *values, bool *nulls, int first);
+static TupleTableSlot *get_violator_tuple(Relation rel);
+
 
 /*
  * RI_FKey_check -
@@ -293,12 +318,15 @@ RI_FKey_check(TriggerData *trigdata)
 	Relation	fk_rel;
 	Relation	pk_rel;
 	bool		is_insert;
+	int			queryno;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
 	Tuplestorestate *oldtable = NULL;
 	Tuplestorestate *newtable = NULL;
 	Tuplestorestate *table;
+	bool		single_row;
 	TupleTableSlot *slot = NULL;
+	bool		found;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, false);
@@ -312,24 +340,6 @@ RI_FKey_check(TriggerData *trigdata)
 	fk_rel = trigdata->tg_relation;
 	pk_rel = table_open(riinfo->pk_relid, RowShareLock);
 
-	if (SPI_connect() != SPI_OK_CONNECT)
-		elog(ERROR, "SPI_connect failed");
-
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK);
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		char	   *query;
-		Oid			paramtypes[RI_MAX_NUMKEYS];
-
-		query = RI_FKey_check_query_single_row(riinfo, fk_rel, pk_rel,
-											   paramtypes);
-
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(query, riinfo->nkeys, paramtypes, &qkey, fk_rel,
-							 pk_rel);
-	}
-
 	/*
 	 * Retrieve the changed rows and put them into the appropriate tuplestore.
 	 */
@@ -370,23 +380,99 @@ RI_FKey_check(TriggerData *trigdata)
 	}
 
 	/*
-	 * Retrieve and check the inserted / updated rows, one after another.
+	 * The query to check a single row requires parameters, so retrieve them
+	 * now if that's the case.
 	 */
-	slot = riinfo->slot_fk;
-	while (tuplestore_gettupleslot(table, true, false, slot))
+	single_row = tuplestore_tuple_count(table) == 1;
+	if (single_row)
 	{
-		if (!ri_PerformCheck(riinfo, &qkey, qplan,
-							 fk_rel, pk_rel,
-							 slot,
-							 false,
-							 SPI_OK_SELECT))
-			ri_ReportViolation(riinfo,
-							   pk_rel, fk_rel,
-							   slot,
-							   NULL,
-							   qkey.constr_queryno, false);
+		slot = riinfo->slot_fk;
+		tuplestore_gettupleslot(table, true, false, slot);
+	}
+
+	if (SPI_connect() != SPI_OK_CONNECT)
+		elog(ERROR, "SPI_connect failed");
+
+	/*
+	 * Bulk processing needs the appropriate "transient table" to be
+	 * registered.
+	 */
+	if (!single_row &&
+		ri_register_trigger_data(trigdata, oldtable, newtable,
+								 riinfo->slot_fk->tts_tupleDescriptor) !=
+		SPI_OK_TD_REGISTER)
+		elog(ERROR, "ri_register_trigger_data failed");
+
+	if (single_row)
+		queryno = RI_PLAN_CHECK_LOOKUPPK_SINGLE;
+	else if (is_insert)
+		queryno = RI_PLAN_CHECK_LOOKUPPK_INS;
+	else
+		queryno = RI_PLAN_CHECK_LOOKUPPK_UPD;
+	ri_BuildQueryKey(&qkey, riinfo, queryno, single_row);
+
+	/* Fetch or prepare a saved plan for the real check */
+	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+	{
+		char	   *query;
+		int			nparams;
+		Oid			paramtypes[RI_MAX_NUMKEYS];
+
+		if (single_row)
+		{
+			query = RI_FKey_check_query_single_row(riinfo, fk_rel, pk_rel,
+												   paramtypes);
+
+			nparams = riinfo->nkeys;
+		}
+		else
+		{
+			query = RI_FKey_check_query(riinfo, fk_rel, pk_rel, is_insert);
+
+			nparams = 0;
+		}
+
+		/* Prepare and save the plan */
+		qplan = ri_PlanCheck(query, nparams, paramtypes, &qkey, fk_rel,
+							 pk_rel);
+	}
+
+	/*
+	 * Now check that foreign key exists in PK table
+	 */
+	found = ri_PerformCheck(riinfo, &qkey, qplan,
+							fk_rel, pk_rel,
+							slot,
+							false,
+							SPI_OK_SELECT);
+
+	/*
+	 * The query for bulk processing returns the first FK row that violates
+	 * the constraint, so use that row to report the violation.
+	 */
+	if (!single_row && found)
+	{
+		TupleTableSlot *violatorslot = get_violator_tuple(fk_rel);
+
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   violatorslot,
+						   NULL,
+						   qkey.constr_queryno, false);
 	}
 
+	/*
+	 * In contrast, the query to check a single FK row returns the matching PK
+	 * row. Failure to find that PK row indicates constraint violation and the
+	 * violating row is in "slot".
+	 */
+	else if (single_row && !found)
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   slot,
+						   NULL,
+						   qkey.constr_queryno, false);
+
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
 
@@ -395,6 +481,90 @@ RI_FKey_check(TriggerData *trigdata)
 	return PointerGetDatum(NULL);
 }
 
+/* ----------
+ * Construct the query to check inserted/updated rows of the FK table.
+ *
+ * If "insert" is true, the rows are inserted, otherwise they are updated.
+ *
+ * The query string built is
+ *	SELECT t.fkatt1 [, ...]
+ *		FROM <tgtable> t LEFT JOIN LATERAL
+ *		    (SELECT t.fkatt1 [, ...]
+ *               FROM [ONLY] <pktable> p
+ *		         WHERE t.fkatt1 = p.pkatt1 [AND ...]
+ *		         FOR KEY SHARE OF p) AS m
+ *		     ON t.fkatt1 = m.fkatt1 [AND ...]
+ *		WHERE m.fkatt1 ISNULL
+ *	    LIMIT 1
+ *
+ * where <tgtable> is "tgoldtable" for INSERT and "tgnewtable" for UPDATE
+ * events.
+ *
+ * It returns the first row that violates the constraint.
+ *
+ * "m" returns the new rows that do have matching PK row. It is a subquery
+ * because the FOR KEY SHARE clause cannot reference the nullable side of an
+ * outer join.
+ *
+ * XXX "tgoldtable" looks confusing for insert, but that's where
+ * AfterTriggerExecute() stores tuples whose events don't have
+ * AFTER_TRIGGER_2CTID set. For a non-RI trigger, the inserted tuple would
+ * fall into tg_trigtuple as opposed to tg_newtuple, which seems a similar
+ * problem. It doesn't seem worth any renaming or adding extra tuplestores to
+ * TriggerData.
+ * ----------
+ */
+static char *
+RI_FKey_check_query(const RI_ConstraintInfo *riinfo, Relation fk_rel,
+					Relation pk_rel, bool insert)
+{
+	StringInfo	querybuf = makeStringInfo();
+	StringInfo	subquerybuf = makeStringInfo();
+	char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
+	const char *pk_only;
+	const char *tgtable;
+	char	   *col_test;
+
+	tgtable = insert ? "tgoldtable" : "tgnewtable";
+
+	quoteRelationName(pkrelname, pk_rel);
+
+	/* Construct the subquery. */
+	pk_only = pk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+		"" : "ONLY ";
+	appendStringInfoString(subquerybuf,
+						   "(SELECT ");
+	ri_GenerateKeyList(subquerybuf, riinfo->nkeys, "t", fk_rel,
+					   riinfo->fk_attnums);
+	appendStringInfo(subquerybuf,
+					 " FROM %s%s p WHERE ",
+					 pk_only, pkrelname);
+	ri_GenerateQual(subquerybuf, "AND", riinfo->nkeys,
+					"p", pk_rel, riinfo->pk_attnums,
+					"t", fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs,
+					GQ_PARAMS_NONE, NULL);
+	appendStringInfoString(subquerybuf, " FOR KEY SHARE OF p) AS m");
+
+	/* Now the main query. */
+	appendStringInfoString(querybuf, "SELECT ");
+	ri_GenerateKeyList(querybuf, riinfo->nkeys, "t", fk_rel,
+					   riinfo->fk_attnums);
+	appendStringInfo(querybuf,
+					 " FROM %s t LEFT JOIN LATERAL %s ON ",
+					 tgtable, subquerybuf->data);
+	ri_GenerateQual(querybuf, "AND", riinfo->nkeys,
+					"t", fk_rel, riinfo->fk_attnums,
+					"m", fk_rel, riinfo->fk_attnums,
+					riinfo->ff_eq_oprs,
+					GQ_PARAMS_NONE, NULL);
+	col_test = ri_ColNameQuoted("m", RIAttName(fk_rel, riinfo->fk_attnums[0]));
+	appendStringInfo(querybuf, " WHERE %s ISNULL ", col_test);
+	appendStringInfoString(querybuf, " LIMIT 1");
+
+	return querybuf->data;
+}
+
 /* ----------
  * Like RI_FKey_check_query(), but check a single row.
  *
@@ -576,7 +746,7 @@ ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 	 * Fetch or prepare a saved plan for checking PK table with values coming
 	 * from a PK row
 	 */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK);
+	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CHECK_LOOKUPPK_FROM_PK, true);
 
 	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 	{
@@ -643,7 +813,7 @@ RI_FKey_noaction_del(PG_FUNCTION_ARGS)
 	ri_CheckTrigger(fcinfo, "RI_FKey_noaction_del", RI_TRIGTYPE_DELETE);
 
 	/* Share code with RESTRICT/UPDATE cases. */
-	return ri_restrict((TriggerData *) fcinfo->context, true, NULL);
+	return ri_restrict((TriggerData *) fcinfo->context, true);
 }
 
 /*
@@ -663,7 +833,7 @@ RI_FKey_restrict_del(PG_FUNCTION_ARGS)
 	ri_CheckTrigger(fcinfo, "RI_FKey_restrict_del", RI_TRIGTYPE_DELETE);
 
 	/* Share code with NO ACTION/UPDATE cases. */
-	return ri_restrict((TriggerData *) fcinfo->context, false, NULL);
+	return ri_restrict((TriggerData *) fcinfo->context, false);
 }
 
 /*
@@ -680,7 +850,7 @@ RI_FKey_noaction_upd(PG_FUNCTION_ARGS)
 	ri_CheckTrigger(fcinfo, "RI_FKey_noaction_upd", RI_TRIGTYPE_UPDATE);
 
 	/* Share code with RESTRICT/DELETE cases. */
-	return ri_restrict((TriggerData *) fcinfo->context, true, NULL);
+	return ri_restrict((TriggerData *) fcinfo->context, true);
 }
 
 /*
@@ -700,7 +870,7 @@ RI_FKey_restrict_upd(PG_FUNCTION_ARGS)
 	ri_CheckTrigger(fcinfo, "RI_FKey_restrict_upd", RI_TRIGTYPE_UPDATE);
 
 	/* Share code with NO ACTION/DELETE cases. */
-	return ri_restrict((TriggerData *) fcinfo->context, false, NULL);
+	return ri_restrict((TriggerData *) fcinfo->context, false);
 }
 
 /*
@@ -708,20 +878,18 @@ RI_FKey_restrict_upd(PG_FUNCTION_ARGS)
  *
  * Common code for ON DELETE RESTRICT, ON DELETE NO ACTION,
  * ON UPDATE RESTRICT, and ON UPDATE NO ACTION.
- *
- * If NULL is passed for oldslot, retrieve the rows from
- * trigdata->ri_tids_old.
  */
 static Datum
-ri_restrict(TriggerData *trigdata, bool is_no_action, TupleTableSlot *oldslot)
+ri_restrict(TriggerData *trigdata, bool is_no_action)
 {
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
-	Tuplestorestate *oldtable = NULL;
-	bool		first_tuple;
+	Tuplestorestate *oldtable;
+	bool		single_row;
+	TupleTableSlot *oldslot = NULL;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -735,45 +903,81 @@ ri_restrict(TriggerData *trigdata, bool is_no_action, TupleTableSlot *oldslot)
 	fk_rel = table_open(riinfo->fk_relid, RowShareLock);
 	pk_rel = trigdata->tg_relation;
 
+	oldtable = get_event_tuplestore(trigdata,
+									riinfo->nkeys,
+									riinfo->pk_attnums,
+									true,
+									riinfo->slot_pk->tts_tupleDescriptor,
+									NULL);
+
+	/* Should we use a special query to check a single row? */
+	single_row = tuplestore_tuple_count(oldtable) == 1;
+	if (single_row)
+	{
+		/* The query needs parameters, so retrieve them now. */
+		oldslot = riinfo->slot_pk;
+		tuplestore_gettupleslot(oldtable, true, false, oldslot);
+
+		/*
+		 * If another PK row now exists providing the old key values, we
+		 * should not do anything.  However, this check should only be made in
+		 * the NO ACTION case; in RESTRICT cases we don't wish to allow
+		 * another row to be substituted.
+		 */
+		if (is_no_action &&
+			ri_Check_Pk_Match(pk_rel, fk_rel, oldslot, riinfo))
+		{
+			table_close(fk_rel, RowShareLock);
+			return PointerGetDatum(NULL);
+		}
+	}
+
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
 
-	/* Fetch or prepare a saved plan for the real check */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_RESTRICT_CHECKREF);
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+	/* Bulk processing needs the "transient table" to be registered. */
+	if (!single_row &&
+		ri_register_trigger_data(trigdata, oldtable, NULL,
+								 riinfo->slot_pk->tts_tupleDescriptor) !=
+		SPI_OK_TD_REGISTER)
+		elog(ERROR, "ri_register_trigger_data failed");
+
+	if (single_row)
 	{
-		char	   *query;
-		Oid			paramtypes[RI_MAX_NUMKEYS];
+		ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_RESTRICT_CHECKREF, true);
 
-		query = ri_restrict_query_single_row(riinfo, fk_rel, pk_rel,
-											 paramtypes);
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+		{
+			char	   *query;
+			Oid			paramtypes[RI_MAX_NUMKEYS];
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(query, riinfo->nkeys, paramtypes, &qkey,
-							 fk_rel, pk_rel);
-	}
+			query = ri_restrict_query_single_row(riinfo, fk_rel, pk_rel,
+												 paramtypes);
 
-	if (oldslot == NULL)
-	{
-		oldtable = get_event_tuplestore(trigdata,
-										riinfo->nkeys,
-										riinfo->pk_attnums,
-										true,
-										riinfo->slot_pk->tts_tupleDescriptor,
-										NULL);
-		oldslot = riinfo->slot_pk;
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, riinfo->nkeys, paramtypes, &qkey,
+								 fk_rel, pk_rel);
+		}
 	}
+	else if (!is_no_action)
+	{
+		/*
+		 * Fetch or prepare a saved plan for the restrict lookup (it's the
+		 * same query for delete and update cases)
+		 */
+		ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_RESTRICT_CHECKREF, false);
 
-	first_tuple = true;
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+		{
+			char	   *query;
 
-	/*
-	 * Retrieve and check the rows, one after another.
-	 *
-	 * One tuple should always be processed: if there's no "oldtable", valid
-	 * "oldslot" should have been passed.
-	 */
-	while ((oldtable && tuplestore_gettupleslot(oldtable, true, false, oldslot))
-		   || first_tuple)
+			query = ri_restrict_query(riinfo, fk_rel, pk_rel, false);
+
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, 0, NULL, &qkey, fk_rel, pk_rel);
+		}
+	}
+	else
 	{
 		/*
 		 * If another PK row now exists providing the old key values, we
@@ -781,30 +985,46 @@ ri_restrict(TriggerData *trigdata, bool is_no_action, TupleTableSlot *oldslot)
 		 * the NO ACTION case; in RESTRICT cases we don't wish to allow
 		 * another row to be substituted.
 		 */
-		if (is_no_action &&
-			ri_Check_Pk_Match(pk_rel, fk_rel, oldslot, riinfo))
-			continue;
+		ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_RESTRICT_CHECKREF_NO_ACTION,
+						 false);
 
-		if (ri_PerformCheck(riinfo, &qkey, qplan,
-							fk_rel, pk_rel,
-							oldslot,
-							true,	/* must detect new rows */
-							SPI_OK_SELECT))
-			ri_ReportViolation(riinfo,
-							   pk_rel, fk_rel,
-							   oldslot,
-							   NULL,
-							   qkey.constr_queryno, false);
-
-		if (first_tuple)
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 		{
-			if (oldtable == NULL)
-				break;
 
-			first_tuple = false;
+			char	   *query;
+
+			query = ri_restrict_query(riinfo, fk_rel, pk_rel, true);
+
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, 0, NULL, &qkey, fk_rel, pk_rel);
 		}
 	}
 
+	/*
+	 * We have a plan now. Run it to check for existing references.
+	 */
+	if (ri_PerformCheck(riinfo, &qkey, qplan,
+						fk_rel, pk_rel,
+						oldslot,
+						true,	/* must detect new rows */
+						SPI_OK_SELECT))
+	{
+		TupleTableSlot *violatorslot;
+
+		/*
+		 * For a single row, oldslot contains the violating key. For bulk
+		 * check, the problematic key value should have been returned by the
+		 * query.
+		 */
+		violatorslot = single_row ? oldslot : get_violator_tuple(pk_rel);
+
+		ri_ReportViolation(riinfo,
+						   pk_rel, fk_rel,
+						   violatorslot,
+						   NULL,
+						   qkey.constr_queryno, false);
+	}
+
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
 
@@ -813,6 +1033,79 @@ ri_restrict(TriggerData *trigdata, bool is_no_action, TupleTableSlot *oldslot)
 	return PointerGetDatum(NULL);
 }
 
+/* ----------
+ * Construct the query to check whether deleted row of the PK table is still
+ * referenced by the FK table.
+ *
+ * If "pk_rel" is NULL, the query string built is
+ *	SELECT o.*
+ *		FROM [ONLY] <fktable> f, tgoldtable o
+ *		WHERE f.fkatt1 = o.pkatt1 [AND ...]
+ *		FOR KEY SHARE OF f
+ *		LIMIT 1
+ *
+ * If no_action is true,also check if the row being deleted was re-inserted
+ * into the PK table (or in case of UPDATE, if row with the old key is there
+ * again):
+ *
+ *	SELECT o.pkatt1 [, ...]
+ *		FROM [ONLY] <fktable> f, tgoldtable o
+ *		WHERE f.fkatt1 = o.pkatt1 [AND ...] AND	NOT EXISTS
+ *			(SELECT 1
+ *			FROM <pktable> p
+ *			WHERE p.pkatt1 = o.pkatt1 [, ...]
+ *			FOR KEY SHARE OF p)
+ *		FOR KEY SHARE OF f
+ *		LIMIT 1
+ *
+ * TODO Is ONLY needed for the the PK table?
+ * ----------
+ */
+static char *
+ri_restrict_query(const RI_ConstraintInfo *riinfo, Relation fk_rel,
+				  Relation pk_rel, bool no_action)
+{
+	StringInfo	querybuf = makeStringInfo();
+	StringInfo	subquerybuf = NULL;
+	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
+	const char *fk_only;
+
+	if (no_action)
+	{
+		char		pkrelname[MAX_QUOTED_REL_NAME_LEN];
+
+		subquerybuf = makeStringInfo();
+		quoteRelationName(pkrelname, pk_rel);
+		appendStringInfo(subquerybuf,
+						 "(SELECT 1 FROM %s p WHERE ", pkrelname);
+		ri_GenerateQual(subquerybuf, "AND", riinfo->nkeys,
+						"p", pk_rel, riinfo->pk_attnums,
+						"o", pk_rel, riinfo->pk_attnums,
+						riinfo->pp_eq_oprs,
+						GQ_PARAMS_NONE, NULL);
+		appendStringInfoString(subquerybuf, " FOR KEY SHARE OF p)");
+	}
+
+	fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+		"" : "ONLY ";
+	quoteRelationName(fkrelname, fk_rel);
+	appendStringInfoString(querybuf, "SELECT ");
+	ri_GenerateKeyList(querybuf, riinfo->nkeys, "o", pk_rel,
+					   riinfo->pk_attnums);
+	appendStringInfo(querybuf, " FROM %s%s f, tgoldtable o WHERE ",
+					 fk_only, fkrelname);
+	ri_GenerateQual(querybuf, "AND", riinfo->nkeys,
+					"o", pk_rel, riinfo->pk_attnums,
+					"f", fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs,
+					GQ_PARAMS_NONE, NULL);
+	if (no_action)
+		appendStringInfo(querybuf, " AND NOT EXISTS %s", subquerybuf->data);
+	appendStringInfoString(querybuf, " FOR KEY SHARE OF f LIMIT 1");
+
+	return querybuf->data;
+}
+
 /*
  * Like ri_restrict_query(), but check a single row.
  */
@@ -867,7 +1160,8 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
 	Tuplestorestate *oldtable;
-	TupleTableSlot *oldslot;
+	bool		single_row;
+	TupleTableSlot *oldslot = NULL;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_del", RI_TRIGTYPE_DELETE);
@@ -884,46 +1178,67 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	fk_rel = table_open(riinfo->fk_relid, RowExclusiveLock);
 	pk_rel = trigdata->tg_relation;
 
+	oldtable = get_event_tuplestore(trigdata,
+									riinfo->nkeys,
+									riinfo->pk_attnums,
+									true,
+									riinfo->slot_pk->tts_tupleDescriptor,
+									NULL);
+
+	/* Should we use a special query to check a single row? */
+	single_row = tuplestore_tuple_count(oldtable) == 1;
+
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
 
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_DEL_DODELETE);
+	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_DEL_DODELETE, single_row);
 
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+	if (single_row)
 	{
-		Oid			paramtypes[RI_MAX_NUMKEYS];
-		char	   *query = ri_cascade_del_query_single_row(riinfo,
-															fk_rel,
-															pk_rel,
-															paramtypes);
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+		{
+			Oid			paramtypes[RI_MAX_NUMKEYS];
+			char	   *query = ri_cascade_del_query_single_row(riinfo,
+																fk_rel,
+																pk_rel,
+																paramtypes);
+
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, riinfo->nkeys, paramtypes, &qkey,
+								 fk_rel, pk_rel);
+		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(query, riinfo->nkeys, paramtypes, &qkey,
-							 fk_rel, pk_rel);
+		/* The query needs parameters, so retrieve them now. */
+		oldslot = riinfo->slot_pk;
+		tuplestore_gettupleslot(oldtable, true, false, oldslot);
 	}
+	else
+	{
+		/* Bulk processing needs the "transient table" to be registered. */
+		if (ri_register_trigger_data(trigdata, oldtable, NULL,
+									 riinfo->slot_pk->tts_tupleDescriptor) !=
+			SPI_OK_TD_REGISTER)
+			elog(ERROR, "ri_register_trigger_data failed");
 
-	oldtable = get_event_tuplestore(trigdata,
-									riinfo->nkeys,
-									riinfo->pk_attnums,
-									true,
-									riinfo->slot_pk->tts_tupleDescriptor,
-									NULL);
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+		{
+			char	   *query = ri_cascade_del_query(riinfo, fk_rel, pk_rel);
 
-	/* Retrieve and check the rows, one after another. */
-	oldslot = riinfo->slot_pk;
-	while (tuplestore_gettupleslot(oldtable, true, false, oldslot))
-	{
-		/*
-		 * We have a plan now. Build up the arguments from the key values in
-		 * the deleted PK tuple and delete the referencing rows
-		 */
-		ri_PerformCheck(riinfo, &qkey, qplan,
-						fk_rel, pk_rel,
-						oldslot,
-						true,	/* must detect new rows */
-						SPI_OK_DELETE);
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, 0, NULL, &qkey, fk_rel, pk_rel);
+		}
 	}
 
+	/*
+	 * We have a plan now. Build up the arguments from the key values in the
+	 * deleted PK tuple and delete the referencing rows
+	 */
+	ri_PerformCheck(riinfo, &qkey, qplan,
+					fk_rel, pk_rel,
+					oldslot,
+					true,		/* must detect new rows */
+					SPI_OK_DELETE);
+
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
 
@@ -932,6 +1247,43 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	return PointerGetDatum(NULL);
 }
 
+static char *
+ri_cascade_del_query(const RI_ConstraintInfo *riinfo, Relation fk_rel,
+					 Relation pk_rel)
+{
+	StringInfo	querybuf = makeStringInfo();
+	StringInfo	subquerybuf = makeStringInfo();
+	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
+	const char *fk_only;
+
+	/* ----------
+	 * The query string built is
+	 *
+	 *	DELETE FROM [ONLY] <fktable> f
+	 *	    WHERE EXISTS
+	 *			(SELECT 1
+	 *			FROM tgoldtable o
+	 *			WHERE o.pkatt1 = f.fkatt1 [AND ...])
+	 * ----------
+	 */
+	appendStringInfoString(subquerybuf,
+						   "SELECT 1 FROM tgoldtable o WHERE ");
+	ri_GenerateQual(subquerybuf, "AND", riinfo->nkeys,
+					"o", pk_rel, riinfo->pk_attnums,
+					"f", fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs,
+					GQ_PARAMS_NONE, NULL);
+
+	fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+		"" : "ONLY ";
+	quoteRelationName(fkrelname, fk_rel);
+	appendStringInfo(querybuf,
+					 "DELETE FROM %s%s f WHERE EXISTS (%s) ",
+					 fk_only, fkrelname, subquerybuf->data);
+
+	return querybuf->data;
+}
+
 static char *
 ri_cascade_del_query_single_row(const RI_ConstraintInfo *riinfo,
 								Relation fk_rel, Relation pk_rel,
@@ -983,7 +1335,8 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
 	Tuplestorestate *newtable;
-	TupleTableSlot *slot;
+	bool		single_row;
+	TupleTableSlot *newslot = NULL;
 
 	/* Check that this is a valid trigger call on the right time and event. */
 	ri_CheckTrigger(fcinfo, "RI_FKey_cascade_upd", RI_TRIGTYPE_UPDATE);
@@ -1001,44 +1354,66 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	fk_rel = table_open(riinfo->fk_relid, RowExclusiveLock);
 	pk_rel = trigdata->tg_relation;
 
+	/*
+	 * In this case, both new and old values should be in the same tuplestore
+	 * because there's no useful join column.
+	 */
+	newtable = get_event_tuplestore_for_cascade_update(trigdata, riinfo);
+
+	/* Should we use a special query to check a single row? */
+	single_row = tuplestore_tuple_count(newtable) == 1;
+
+	/* Fetch or prepare a saved plan for the cascaded update */
+	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_UPD_DOUPDATE, single_row);
+
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
 
-	/* Fetch or prepare a saved plan for the cascaded update */
-	ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_UPD_DOUPDATE);
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+	if (single_row)
 	{
-		Oid			paramtypes[RI_MAX_NUMKEYS * 2];
-		char	   *query = ri_cascade_upd_query_single_row(riinfo,
-															fk_rel,
-															pk_rel,
-															paramtypes);
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+		{
+			Oid			paramtypes[RI_MAX_NUMKEYS * 2];
+			char	   *query = ri_cascade_upd_query_single_row(riinfo,
+																fk_rel,
+																pk_rel,
+																paramtypes);
+
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, 2 * riinfo->nkeys, paramtypes, &qkey,
+								 fk_rel, pk_rel);
+		}
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(query, 2 * riinfo->nkeys, paramtypes, &qkey,
-							 fk_rel, pk_rel);
+		/* The query needs parameters, so retrieve them now. */
+		newslot = riinfo->slot_both;
+		tuplestore_gettupleslot(newtable, true, false, newslot);
 	}
+	else
+	{
+		/* Here it doesn't matter whether we call the table "old" or "new". */
+		if (ri_register_trigger_data(trigdata, NULL, newtable,
+									 riinfo->slot_both->tts_tupleDescriptor) !=
+			SPI_OK_TD_REGISTER)
+			elog(ERROR, "ri_register_trigger_data failed");
 
-	/*
-	 * In this case, both old and new values should be in the same tuplestore
-	 * because there's no useful join column.
-	 */
-	newtable = get_event_tuplestore_for_cascade_update(trigdata, riinfo);
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+		{
+			char	   *query = ri_cascade_upd_query(riinfo, fk_rel, pk_rel);
 
-	/* Retrieve and check the rows, one after another. */
-	slot = riinfo->slot_both;
-	while (tuplestore_gettupleslot(newtable, true, false, slot))
-	{
-		/*
-		 * We have a plan now. Run it to update the existing references.
-		 */
-		ri_PerformCheck(riinfo, &qkey, qplan,
-						fk_rel, pk_rel,
-						slot,
-						true,	/* must detect new rows */
-						SPI_OK_UPDATE);
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, 0, NULL, &qkey, fk_rel, pk_rel);
+		}
 	}
 
+	/*
+	 * We have a plan now. Run it to update the existing references.
+	 */
+	ri_PerformCheck(riinfo, &qkey, qplan,
+					fk_rel, pk_rel,
+					newslot,
+					true,		/* must detect new rows */
+					SPI_OK_UPDATE);
+
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
 
@@ -1047,6 +1422,69 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
 	return PointerGetDatum(NULL);
 }
 
+static char *
+ri_cascade_upd_query(const RI_ConstraintInfo *riinfo, Relation fk_rel,
+					 Relation pk_rel)
+{
+	StringInfo	querybuf = makeStringInfo();
+	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
+	const char *fk_only;
+	int			i;
+
+	/* ----------
+	 * The query string built is
+	 *
+	 * UPDATE [ONLY] <fktable> f
+	 *     SET fkatt1 = n.pkatt1_new [, ...]
+	 *     FROM tgnewtable n
+	 *     WHERE
+	 *         f.fkatt1 = n.pkatt1_old [AND ...]
+	 *
+	 * Note that we are assuming there is an assignment cast from the PK
+	 * to the FK type; else the parser will fail.
+	 * ----------
+	 */
+	fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+		"" : "ONLY ";
+	quoteRelationName(fkrelname, fk_rel);
+	appendStringInfo(querybuf, "UPDATE %s%s f SET ", fk_only, fkrelname);
+
+	for (i = 0; i < riinfo->nkeys; i++)
+	{
+		char	   *latt = ri_ColNameQuoted("", RIAttName(fk_rel, riinfo->fk_attnums[i]));
+		Oid			lcoll = RIAttCollation(fk_rel, riinfo->fk_attnums[i]);
+		char		ratt[NAMEDATALEN];
+		Oid			rcoll = RIAttCollation(pk_rel, riinfo->pk_attnums[i]);
+
+		snprintf(ratt, NAMEDATALEN, "n.pkatt%d_new", i + 1);
+
+		if (i > 0)
+			appendStringInfoString(querybuf, ", ");
+
+		appendStringInfo(querybuf, "%s = %s", latt, ratt);
+
+		if (lcoll != rcoll)
+			ri_GenerateQualCollation(querybuf, lcoll);
+	}
+
+	appendStringInfo(querybuf, " FROM tgnewtable n WHERE");
+
+	for (i = 0; i < riinfo->nkeys; i++)
+	{
+		char	   *fattname;
+
+		if (i > 0)
+			appendStringInfoString(querybuf, " AND");
+
+		fattname =
+			ri_ColNameQuoted("f",
+							 RIAttName(fk_rel, riinfo->fk_attnums[i]));
+		appendStringInfo(querybuf, " %s = n.pkatt%d_old", fattname, i + 1);
+	}
+
+	return querybuf->data;
+}
+
 static char *
 ri_cascade_upd_query_single_row(const RI_ConstraintInfo *riinfo,
 								Relation fk_rel, Relation pk_rel,
@@ -1186,7 +1624,8 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
 	Tuplestorestate *oldtable;
-	TupleTableSlot *oldslot;
+	bool		single_row;
+	TupleTableSlot *oldslot = NULL;
 
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
@@ -1200,9 +1639,19 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 	fk_rel = table_open(riinfo->fk_relid, RowExclusiveLock);
 	pk_rel = trigdata->tg_relation;
 
+	oldtable = get_event_tuplestore(trigdata,
+									riinfo->nkeys,
+									riinfo->pk_attnums,
+									true,
+									riinfo->slot_pk->tts_tupleDescriptor,
+									NULL);
+
 	if (SPI_connect() != SPI_OK_CONNECT)
 		elog(ERROR, "SPI_connect failed");
 
+	/* Should we use a special query to check a single row? */
+	single_row = tuplestore_tuple_count(oldtable) == 1;
+
 	/*
 	 * Fetch or prepare a saved plan for the set null/default operation (it's
 	 * the same query for delete and update cases)
@@ -1210,64 +1659,124 @@ ri_set(TriggerData *trigdata, bool is_set_null)
 	ri_BuildQueryKey(&qkey, riinfo,
 					 (is_set_null
 					  ? RI_PLAN_SETNULL_DOUPDATE
-					  : RI_PLAN_SETDEFAULT_DOUPDATE));
-	if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
-	{
-		Oid			paramtypes[RI_MAX_NUMKEYS];
-		char	   *query = ri_set_query_single_row(riinfo, fk_rel, pk_rel,
-													paramtypes, is_set_null);
+					  : RI_PLAN_SETDEFAULT_DOUPDATE),
+					 single_row);
 
-		/* Prepare and save the plan */
-		qplan = ri_PlanCheck(query, riinfo->nkeys, paramtypes, &qkey,
-							 fk_rel, pk_rel);
-	}
+	if (single_row)
+	{
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+		{
+			Oid			paramtypes[RI_MAX_NUMKEYS];
+			char	   *query = ri_set_query_single_row(riinfo, fk_rel, pk_rel,
+														paramtypes, is_set_null);
 
-	oldtable = get_event_tuplestore(trigdata,
-									riinfo->nkeys,
-									riinfo->pk_attnums,
-									true,
-									riinfo->slot_pk->tts_tupleDescriptor,
-									NULL);
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, riinfo->nkeys, paramtypes, &qkey,
+								 fk_rel, pk_rel);
+		}
 
-	/* The query needs parameters, so retrieve them now. */
-	oldslot = riinfo->slot_pk;
-	while (tuplestore_gettupleslot(oldtable, true, false, oldslot))
+		/* The query needs parameters, so retrieve them now. */
+		oldslot = riinfo->slot_pk;
+		tuplestore_gettupleslot(oldtable, true, false, oldslot);
+	}
+	else
 	{
-		/*
-		 * We have a plan now. Run it to update the existing references.
-		 */
-		ri_PerformCheck(riinfo, &qkey, qplan,
-						fk_rel, pk_rel,
-						oldslot,
-						true,	/* must detect new rows */
-						SPI_OK_UPDATE);
+		/* Here it doesn't matter whether we call the table "old" or "new". */
+		if (ri_register_trigger_data(trigdata, oldtable, NULL,
+									 riinfo->slot_pk->tts_tupleDescriptor) !=
+			SPI_OK_TD_REGISTER)
+			elog(ERROR, "ri_register_trigger_data failed");
 
-		if (!is_set_null)
+		if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 		{
-			/*
-			 * If we just deleted or updated the PK row whose key was equal to
-			 * the FK columns' default values, and a referencing row exists in
-			 * the FK table, we would have updated that row to the same values
-			 * it already had --- and RI_FKey_fk_upd_check_required would
-			 * hence believe no check is necessary.  So we need to do another
-			 * lookup now and in case a reference still exists, abort the
-			 * operation.  That is already implemented in the NO ACTION
-			 * trigger, so just run it. (This recheck is only needed in the
-			 * SET DEFAULT case, since CASCADE would remove such rows in case
-			 * of a DELETE operation or would change the FK key values in case
-			 * of an UPDATE, while SET NULL is certain to result in rows that
-			 * satisfy the FK constraint.)
-			 */
-			ri_restrict(trigdata, true, oldslot);
+			char	   *query = ri_set_query(riinfo, fk_rel, pk_rel,
+											 is_set_null);
+
+			/* Prepare and save the plan */
+			qplan = ri_PlanCheck(query, 0, NULL, &qkey, fk_rel, pk_rel);
 		}
 	}
 
+	/*
+	 * We have a plan now. Run it to update the existing references.
+	 */
+	ri_PerformCheck(riinfo, &qkey, qplan,
+					fk_rel, pk_rel,
+					oldslot,
+					true,		/* must detect new rows */
+					SPI_OK_UPDATE);
+
 	if (SPI_finish() != SPI_OK_FINISH)
 		elog(ERROR, "SPI_finish failed");
 
 	table_close(fk_rel, RowExclusiveLock);
 
-	return PointerGetDatum(NULL);
+	if (is_set_null)
+		return PointerGetDatum(NULL);
+	else
+	{
+		/*
+		 * If we just deleted or updated the PK row whose key was equal to the
+		 * FK columns' default values, and a referencing row exists in the FK
+		 * table, we would have updated that row to the same values it already
+		 * had --- and RI_FKey_fk_upd_check_required would hence believe no
+		 * check is necessary.  So we need to do another lookup now and in
+		 * case a reference still exists, abort the operation.  That is
+		 * already implemented in the NO ACTION trigger, so just run it. (This
+		 * recheck is only needed in the SET DEFAULT case, since CASCADE would
+		 * remove such rows in case of a DELETE operation or would change the
+		 * FK key values in case of an UPDATE, while SET NULL is certain to
+		 * result in rows that satisfy the FK constraint.)
+		 */
+		return ri_restrict(trigdata, true);
+	}
+}
+
+static char *
+ri_set_query(const RI_ConstraintInfo *riinfo, Relation fk_rel,
+			 Relation pk_rel, bool is_set_null)
+{
+	StringInfo	querybuf = makeStringInfo();
+	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
+	const char *querysep;
+	const char *fk_only;
+
+	/* ----------
+	 * The query string built is
+	 *	UPDATE [ONLY] <fktable> f
+	 *	    SET fkatt1 = {NULL|DEFAULT} [, ...]
+	 *	    FROM tgoldtable o
+	 *		WHERE o.pkatt1 = f.fkatt1 [AND ...]
+	 * ----------
+	 */
+	initStringInfo(querybuf);
+	fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+		"" : "ONLY ";
+	quoteRelationName(fkrelname, fk_rel);
+	appendStringInfo(querybuf, "UPDATE %s%s f SET",
+					 fk_only, fkrelname);
+	querysep = "";
+	for (int i = 0; i < riinfo->nkeys; i++)
+	{
+		char		attname[MAX_QUOTED_NAME_LEN];
+
+		quoteOneName(attname,
+					 RIAttName(fk_rel, riinfo->fk_attnums[i]));
+		appendStringInfo(querybuf,
+						 "%s %s = %s",
+						 querysep, attname,
+						 is_set_null ? "NULL" : "DEFAULT");
+		querysep = ",";
+	}
+
+	appendStringInfoString(querybuf, " FROM tgoldtable o WHERE ");
+	ri_GenerateQual(querybuf, "AND", riinfo->nkeys,
+					"o", pk_rel, riinfo->pk_attnums,
+					"f", fk_rel, riinfo->fk_attnums,
+					riinfo->pf_eq_oprs,
+					GQ_PARAMS_NONE, NULL);
+
+	return querybuf->data;
 }
 
 static char *
@@ -1716,7 +2225,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 		ri_ReportViolation(&fake_riinfo,
 						   pk_rel, fk_rel,
 						   slot, tupdesc,
-						   RI_PLAN_CHECK_LOOKUPPK, false);
+						   RI_PLAN_CHECK_LOOKUPPK_INS, false);
 
 		ExecDropSingleTupleTableSlot(slot);
 	}
@@ -2041,6 +2550,25 @@ ri_GenerateQualComponent(StringInfo buf,
 							 rightop, rightoptype);
 }
 
+/*
+ * ri_GenerateKeyList --- generate comma-separated list of key attributes.
+ */
+static void
+ri_GenerateKeyList(StringInfo buf, int nkeys,
+				   const char *tabname, Relation rel,
+				   const int16 *attnums)
+{
+	for (int i = 0; i < nkeys; i++)
+	{
+		char	   *att = ri_ColNameQuoted(tabname, RIAttName(rel, attnums[i]));
+
+		if (i > 0)
+			appendStringInfoString(buf, ", ");
+
+		appendStringInfoString(buf, att);
+	}
+}
+
 /*
  * ri_ColNameQuoted() --- return column name, with both table and column name
  * quoted.
@@ -2169,7 +2697,7 @@ ri_GenerateQualCollation(StringInfo buf, Oid collation)
  */
 static void
 ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
-				 int32 constr_queryno)
+				 int32 constr_queryno, bool single_row)
 {
 	/*
 	 * We assume struct RI_QueryKey contains no padding bytes, else we'd need
@@ -2177,6 +2705,7 @@ ri_BuildQueryKey(RI_QueryKey *key, const RI_ConstraintInfo *riinfo,
 	 */
 	key->constr_id = riinfo->constraint_id;
 	key->constr_queryno = constr_queryno;
+	key->single_row = single_row;
 }
 
 /*
@@ -2620,7 +3149,9 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
 	/*
 	 * Determine which relation to complain about.
 	 */
-	onfk = queryno == RI_PLAN_CHECK_LOOKUPPK;
+	onfk = (queryno == RI_PLAN_CHECK_LOOKUPPK_SINGLE ||
+			queryno == RI_PLAN_CHECK_LOOKUPPK_INS ||
+			queryno == RI_PLAN_CHECK_LOOKUPPK_UPD);
 	if (onfk)
 	{
 		attnums = riinfo->fk_attnums;
@@ -3173,6 +3704,40 @@ RI_FKey_trigger_type(Oid tgfoid)
 	return RI_TRIGGER_NONE;
 }
 
+/*
+ * Wrapper around SPI_register_trigger_data() that lets us register the RI
+ * trigger tuplestores w/o having to set tg_oldtable/tg_newtable and also w/o
+ * having to set tgoldtable/tgnewtable in pg_trigger.
+ *
+ * XXX This is rather a hack, try to invent something better.
+ */
+static int
+ri_register_trigger_data(TriggerData *tdata, Tuplestorestate *oldtable,
+						 Tuplestorestate *newtable, TupleDesc desc)
+{
+	TriggerData *td = (TriggerData *) palloc(sizeof(TriggerData));
+	Trigger    *tg = (Trigger *) palloc(sizeof(Trigger));
+	int			result;
+
+	Assert(tdata->tg_trigger->tgoldtable == NULL &&
+		   tdata->tg_trigger->tgnewtable == NULL);
+
+	*td = *tdata;
+
+	td->tg_oldtable = oldtable;
+	td->tg_newtable = newtable;
+
+	*tg = *tdata->tg_trigger;
+	tg->tgoldtable = pstrdup("tgoldtable");
+	tg->tgnewtable = pstrdup("tgnewtable");
+	td->tg_trigger = tg;
+	td->desc = desc;
+
+	result = SPI_register_trigger_data(td);
+
+	return result;
+}
+
 /*
  * Turn TID array into a tuplestore. If snapshot is passed, only use tuples
  * visible by this snapshot.
@@ -3441,3 +4006,21 @@ add_key_values(TupleTableSlot *slot, const RI_ConstraintInfo *riinfo,
 	if (shouldfree)
 		pfree(tuple);
 }
+
+
+/*
+ * Retrieve the row that violates RI constraint and return it in a tuple slot.
+ */
+static TupleTableSlot *
+get_violator_tuple(Relation rel)
+{
+	HeapTuple	tuple;
+	TupleTableSlot *slot;
+
+	Assert(SPI_tuptable && SPI_tuptable->numvals == 1);
+
+	tuple = SPI_tuptable->vals[0];
+	slot = MakeSingleTupleTableSlot(SPI_tuptable->tupdesc, &TTSOpsHeapTuple);
+	ExecStoreHeapTuple(tuple, slot, false);
+	return slot;
+}
-- 
2.20.1

#32Andres Freund
andres@anarazel.de
In reply to: Antonin Houska (#31)
Re: More efficient RI checks - take 2

Hi,

I was looking at this patch with Corey during a patch-review session. So
these are basically our "combined" comments.

On 2020-06-05 17:16:43 +0200, Antonin Houska wrote:

From 6c1cb8ae7fbf0a8122d8c6637c61b9915bc57223 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Fri, 5 Jun 2020 16:42:34 +0200
Subject: [PATCH 1/5] Check for RI violation outside ri_PerformCheck().

Probably good to add a short comment to the commit explaining why you're
doing this.

The change makes sense to me. Unless somebody protests I think we should
just apply it regardless of the rest of the series - the code seems
clearer afterwards.

From 6b09e5598553c8e57b4ef9342912f51adb48f8af Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Fri, 5 Jun 2020 16:42:34 +0200
Subject: [PATCH 2/5] Changed ri_GenerateQual() so it generates the whole
qualifier.

This way we can use the function to reduce the amount of copy&pasted code a
bit.

/*
- * ri_GenerateQual --- generate a WHERE clause equating two variables
+ * ri_GenerateQual --- generate WHERE/ON clause.
+ *
+ * Note: to avoid unnecessary explicit casts, make sure that the left and
+ * right operands match eq_oprs expect (ie don't swap the left and right
+ * operands accidentally).
+ */
+static void
+ri_GenerateQual(StringInfo buf, char *sep, int nkeys,
+				const char *ltabname, Relation lrel,
+				const int16 *lattnums,
+				const char *rtabname, Relation rrel,
+				const int16 *rattnums,
+				const Oid *eq_oprs,
+				GenQualParams params,
+				Oid *paramtypes)
+{
+	for (int i = 0; i < nkeys; i++)
+	{
+		Oid			ltype = RIAttType(lrel, lattnums[i]);
+		Oid			rtype = RIAttType(rrel, rattnums[i]);
+		Oid			lcoll = RIAttCollation(lrel, lattnums[i]);
+		Oid			rcoll = RIAttCollation(rrel, rattnums[i]);
+		char		paramname[16];
+		char	   *latt,
+				   *ratt;
+		char	   *sep_current = i > 0 ? sep : NULL;
+
+		if (params != GQ_PARAMS_NONE)
+			sprintf(paramname, "$%d", i + 1);
+
+		if (params == GQ_PARAMS_LEFT)
+		{
+			latt = paramname;
+			paramtypes[i] = ltype;
+		}
+		else
+			latt = ri_ColNameQuoted(ltabname, RIAttName(lrel, lattnums[i]));
+
+		if (params == GQ_PARAMS_RIGHT)
+		{
+			ratt = paramname;
+			paramtypes[i] = rtype;
+		}
+		else
+			ratt = ri_ColNameQuoted(rtabname, RIAttName(rrel, rattnums[i]));

Why do we need support for having params on left or right side, instead
of just having them on one side?

+		ri_GenerateQualComponent(buf, sep_current, latt, ltype, eq_oprs[i],
+								 ratt, rtype);
+
+		if (lcoll != rcoll)
+			ri_GenerateQualCollation(buf, lcoll);
+	}
+}
+
+/*
+ * ri_GenerateQual --- generate a component of WHERE/ON clause equating two
+ * variables, to be AND-ed to the other components.
*
* This basically appends " sep leftop op rightop" to buf, adding casts
* and schema qualification as needed to ensure that the parser will select
@@ -1828,17 +1802,86 @@ quoteRelationName(char *buffer, Relation rel)
* if they aren't variables or parameters.
*/
static void
-ri_GenerateQual(StringInfo buf,
-				const char *sep,
-				const char *leftop, Oid leftoptype,
-				Oid opoid,
-				const char *rightop, Oid rightoptype)
+ri_GenerateQualComponent(StringInfo buf,
+						 const char *sep,
+						 const char *leftop, Oid leftoptype,
+						 Oid opoid,
+						 const char *rightop, Oid rightoptype)
{
-	appendStringInfo(buf, " %s ", sep);
+	if (sep)
+		appendStringInfo(buf, " %s ", sep);
generate_operator_clause(buf, leftop, leftoptype, opoid,
rightop, rightoptype);
}

Why is this handled inside ri_GenerateQualComponent() instead of
ri_GenerateQual()? Especially because the latter now has code to pass in
a different sep into ri_GenerateQualComponent().

+/*
+ * ri_ColNameQuoted() --- return column name, with both table and column name
+ * quoted.
+ */
+static char *
+ri_ColNameQuoted(const char *tabname, const char *attname)
+{
+	char		quoted[MAX_QUOTED_NAME_LEN];
+	StringInfo	result = makeStringInfo();
+
+	if (tabname && strlen(tabname) > 0)
+	{
+		quoteOneName(quoted, tabname);
+		appendStringInfo(result, "%s.", quoted);
+	}
+
+	quoteOneName(quoted, attname);
+	appendStringInfoString(result, quoted);
+
+	return result->data;
+}

Why does this new function accept a NULL / zero length string? I guess
that's because we currently don't qualify in all places?

+/*
+ * Check that RI trigger function was called in expected context
+ */
+static void
+ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname, int tgkind)
+{
+	TriggerData *trigdata = (TriggerData *) fcinfo->context;
+
+	if (!CALLED_AS_TRIGGER(fcinfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+				 errmsg("function \"%s\" was not called by trigger manager", funcname)));
+
+	/*
+	 * Check proper event
+	 */
+	if (!TRIGGER_FIRED_AFTER(trigdata->tg_event) ||
+		!TRIGGER_FIRED_FOR_ROW(trigdata->tg_event))
+		ereport(ERROR,
+				(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+				 errmsg("function \"%s\" must be fired AFTER ROW", funcname)));
+
+	switch (tgkind)
+	{
+		case RI_TRIGTYPE_INSERT:
+			if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event))
+				ereport(ERROR,
+						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+						 errmsg("function \"%s\" must be fired for INSERT", funcname)));
+			break;
+		case RI_TRIGTYPE_UPDATE:
+			if (!TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
+				ereport(ERROR,
+						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+						 errmsg("function \"%s\" must be fired for UPDATE", funcname)));
+			break;
+
+		case RI_TRIGTYPE_DELETE:
+			if (!TRIGGER_FIRED_BY_DELETE(trigdata->tg_event))
+				ereport(ERROR,
+						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+						 errmsg("function \"%s\" must be fired for DELETE", funcname)));
+			break;
+	}
+}
+

Why did you move this around, as part of this commit?

From 208c733d759592402901599446b4f7e7197c1777 Mon Sep 17 00:00:00 2001
From: Antonin Houska <ah@cybertec.at>
Date: Fri, 5 Jun 2020 16:42:34 +0200
Subject: [PATCH 4/5] Introduce infrastructure for batch processing RI events.

Separate storage is used for the RI trigger events because the "transient
table" that we provide to statement triggers would not be available for
deferred constraints. Also, the regular statement level trigger is not ideal
for the RI checks because it requires the query execution to complete before
the RI checks even start. On the other hand, if we use batches of row trigger
events, we only need to tune the batch size so that user gets RI violation
error rather soon.

This patch only introduces the infrastructure, however the trigger function is
still called per event. This is just to reduce the size of the diffs.
---
src/backend/commands/tablecmds.c | 68 +-
src/backend/commands/trigger.c | 406 ++++++--
src/backend/executor/spi.c | 16 +-
src/backend/utils/adt/ri_triggers.c | 1385 +++++++++++++++++++--------
src/include/commands/trigger.h | 25 +
5 files changed, 1381 insertions(+), 519 deletions(-)

My first comment here is that this is too large a change and should be
broken up.

I think there's also not enough explanation in here what the new design
is. I can infer some of that from the code, but that's imo shifting work
to the reviewer / reader unnecessarily.

+static void AfterTriggerExecuteRI(EState *estate,
+								  ResultRelInfo *relInfo,
+								  FmgrInfo *finfo,
+								  Instrumentation *instr,
+								  TriggerData *trig_last,
+								  MemoryContext batch_context);
static AfterTriggersTableData *GetAfterTriggersTableData(Oid relid,
CmdType cmdType);
static void AfterTriggerFreeQuery(AfterTriggersQueryData *qs);
@@ -3807,13 +3821,16 @@ afterTriggerDeleteHeadEventChunk(AfterTriggersQueryData *qs)
*	fmgr lookup cache space at the caller level.  (For triggers fired at
*	the end of a query, we can even piggyback on the executor's state.)
*
- *	event: event currently being fired.
+ *	event: event currently being fired. Pass NULL if the current batch of RI
+ *		trigger events should be processed.
*	rel: open relation for event.
*	trigdesc: working copy of rel's trigger info.
*	finfo: array of fmgr lookup cache entries (one per trigger in trigdesc).
*	instr: array of EXPLAIN ANALYZE instrumentation nodes (one per trigger),
*		or NULL if no instrumentation is wanted.
+ *	trig_last: trigger info used for the last trigger execution.
*	per_tuple_context: memory context to call trigger function in.
+ *	batch_context: memory context to store tuples for RI triggers.
*	trig_tuple_slot1: scratch slot for tg_trigtuple (foreign tables only)
*	trig_tuple_slot2: scratch slot for tg_newtuple (foreign tables only)
* ----------
@@ -3824,39 +3841,55 @@ AfterTriggerExecute(EState *estate,
ResultRelInfo *relInfo,
TriggerDesc *trigdesc,
FmgrInfo *finfo, Instrumentation *instr,
+					TriggerData *trig_last,
MemoryContext per_tuple_context,
+					MemoryContext batch_context,
TupleTableSlot *trig_tuple_slot1,
TupleTableSlot *trig_tuple_slot2)
{
Relation	rel = relInfo->ri_RelationDesc;
AfterTriggerShared evtshared = GetTriggerSharedData(event);
Oid			tgoid = evtshared->ats_tgoid;
-	TriggerData LocTriggerData = {0};
HeapTuple	rettuple;
-	int			tgindx;
bool		should_free_trig = false;
bool		should_free_new = false;
+	bool		is_new = false;
-	/*
-	 * Locate trigger in trigdesc.
-	 */
-	for (tgindx = 0; tgindx < trigdesc->numtriggers; tgindx++)
+	if (trig_last->tg_trigger == NULL)
{
-		if (trigdesc->triggers[tgindx].tgoid == tgoid)
+		int			tgindx;
+
+		/*
+		 * Locate trigger in trigdesc.
+		 */
+		for (tgindx = 0; tgindx < trigdesc->numtriggers; tgindx++)
{
-			LocTriggerData.tg_trigger = &(trigdesc->triggers[tgindx]);
-			break;
+			if (trigdesc->triggers[tgindx].tgoid == tgoid)
+			{
+				trig_last->tg_trigger = &(trigdesc->triggers[tgindx]);
+				trig_last->tgindx = tgindx;
+				break;
+			}
}
+		if (trig_last->tg_trigger == NULL)
+			elog(ERROR, "could not find trigger %u", tgoid);
+
+		if (RI_FKey_trigger_type(trig_last->tg_trigger->tgfoid) !=
+			RI_TRIGGER_NONE)
+			trig_last->is_ri_trigger = true;
+
+		is_new = true;
}
-	if (LocTriggerData.tg_trigger == NULL)
-		elog(ERROR, "could not find trigger %u", tgoid);
+
+	/* trig_last for non-RI trigger should always be initialized again. */
+	Assert(trig_last->is_ri_trigger || is_new);
/*
* If doing EXPLAIN ANALYZE, start charging time to this trigger. We want
* to include time spent re-fetching tuples in the trigger cost.
*/
-	if (instr)
-		InstrStartNode(instr + tgindx);
+	if (instr && !trig_last->is_ri_trigger)
+		InstrStartNode(instr + trig_last->tgindx);

I'm pretty unhappy about the amount of new infrastructure this adds to
trigger.c. We're now going to have a third copy of the tuples (for a
time). trigger.c is already a pretty complicated / archaic piece of
infrastructure, and this patchset seems to make that even worse. We'll
grow yet another separate representation of tuples, there's a lot new
branches (less concerned about the runtime costs, more about the code
complexity) etc.

+/* ----------
+ * Construct the query to check inserted/updated rows of the FK table.
+ *
+ * If "insert" is true, the rows are inserted, otherwise they are updated.
+ *
+ * The query string built is
+ *	SELECT t.fkatt1 [, ...]
+ *		FROM <tgtable> t LEFT JOIN LATERAL
+ *		    (SELECT t.fkatt1 [, ...]
+ *               FROM [ONLY] <pktable> p
+ *		         WHERE t.fkatt1 = p.pkatt1 [AND ...]
+ *		         FOR KEY SHARE OF p) AS m
+ *		     ON t.fkatt1 = m.fkatt1 [AND ...]
+ *		WHERE m.fkatt1 ISNULL
+ *	    LIMIT 1
+ *

Why do we need the lateral query here?

Greetings,

Andres Freund

#33Justin Pryzby
pryzby@telsasoft.com
In reply to: Antonin Houska (#31)
Re: More efficient RI checks - take 2

On Fri, Jun 05, 2020 at 05:16:43PM +0200, Antonin Houska wrote:

Antonin Houska <ah@cybertec.at> wrote:

In general, the checks are significantly faster if there are many rows to
process, and a bit slower when we only need to check a single row.

Attached is a new version that uses the existing simple queries if there's
only one row to check. SPI is used for both single-row and bulk checks - as
discussed in this thread, it can perhaps be replaced with a different approach
if appears to be beneficial, at least for the single-row checks.

I think using a separate query for the single-row check is more practicable
than convincing the planner that the bulk-check query should only check a
single row. So this patch version tries to show what it'd look like.

I'm interested in testing this patch, however there's a lot of internals to
digest.

Are there any documentation updates or regression tests to add ? If FKs
support "bulk" validation, users should know when that applies, and be able to
check that it's working as intended. Even if the test cases are overly verbose
or not stable, and not intended for commit, that would be a useful temporary
addition.

I think that calls=4 indicates this is using bulk validation.

postgres=# begin; explain(analyze, timing off, costs off, summary off, verbose) DELETE FROM t WHERE i<999; rollback;
BEGIN
QUERY PLAN
-----------------------------------------------------------------------
Delete on public.t (actual rows=0 loops=1)
-> Index Scan using t_pkey on public.t (actual rows=998 loops=1)
Output: ctid
Index Cond: (t.i < 999)
Trigger RI_ConstraintTrigger_a_16399 for constraint t_i_fkey: calls=4

I started thinking about this 1+ years ago wondering if a BRIN index could be
used for (bulk) FK validation.

So I would like to be able to see the *plan* for the query.

I was able to show the plan and see that BRIN can be used like so:
|SET auto_explain.log_nested_statements=on; SET client_min_messages=debug; SET auto_explain.log_min_duration=0;
Should the plan be visible in explain (not auto-explain) ?

BTW did you see this older thread ?
/messages/by-id/CA+U5nMLM1DaHBC6JXtUMfcG6f7FgV5mPSpufO7GRnbFKkF2f7g@mail.gmail.com

--
Justin

#34Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#33)
Re: More efficient RI checks - take 2

On Sat, Sep 26, 2020 at 09:59:17PM -0500, Justin Pryzby wrote:

I'm interested in testing this patch, however there's a lot of internals to
digest.

Even with that, the thread has been waiting on author for a couple of
weeks now, so I have marke dthe entry as RwF.
--
Michael

#35Antonin Houska
ah@cybertec.at
In reply to: Justin Pryzby (#33)
Re: More efficient RI checks - take 2

Justin Pryzby <pryzby@telsasoft.com> wrote:

I'm interested in testing this patch, however there's a lot of internals to
digest.

Are there any documentation updates or regression tests to add ?

I'm not sure if user documentation should be changed unless a new GUC or
statistics information is added. As for regression tests, perhaps in the next
version of the patch. But right now I don't know how to implement the feature
in a less invasive way (see the complaint by Andres in [1]/messages/by-id/20200630011729.mr25bmmbvsattxe2@alap3.anarazel.de), nor do I have
enough time to work on the patch.

If FKs support "bulk" validation, users should know when that applies, and
be able to check that it's working as intended. Even if the test cases are
overly verbose or not stable, and not intended for commit, that would be a
useful temporary addition.

I think that calls=4 indicates this is using bulk validation.

postgres=# begin; explain(analyze, timing off, costs off, summary off, verbose) DELETE FROM t WHERE i<999; rollback;
BEGIN
QUERY PLAN
-----------------------------------------------------------------------
Delete on public.t (actual rows=0 loops=1)
-> Index Scan using t_pkey on public.t (actual rows=998 loops=1)
Output: ctid
Index Cond: (t.i < 999)
Trigger RI_ConstraintTrigger_a_16399 for constraint t_i_fkey: calls=4

I started thinking about this 1+ years ago wondering if a BRIN index could be
used for (bulk) FK validation.

So I would like to be able to see the *plan* for the query.

I was able to show the plan and see that BRIN can be used like so:
|SET auto_explain.log_nested_statements=on; SET client_min_messages=debug; SET auto_explain.log_min_duration=0;
Should the plan be visible in explain (not auto-explain) ?

For development purposes, I thin I could get the plan this way:

SET debug_print_plan TO on;
SET client_min_messages TO debug;

(The plan is cached, so I think the query will only be displayed during the
first execution in the session).

Do you think that the documentation should advise the user to create BRIN
index on the FK table?

BTW did you see this older thread ?
/messages/by-id/CA+U5nMLM1DaHBC6JXtUMfcG6f7FgV5mPSpufO7GRnbFKkF2f7g@mail.gmail.com

Not yet. Thanks.

[1]: /messages/by-id/20200630011729.mr25bmmbvsattxe2@alap3.anarazel.de

--
Antonin Houska
Web: https://www.cybertec-postgresql.com