Referential Integrity Checks with Statement-level Triggers

Started by Corey Huinkerabout 7 years ago16 messages

corey.huinker@gmail.com

about 7 years ago

Back when Pg added statement-level triggers, I was interested in the
potential promise of moving referential integrity checks to statement-level
triggers.

The initial conversation, along with Kevin Grittner's POC script (in SQL)
that showed a potential for a 98% reduction in time spent doing RI checks.
The original thread is here:

/messages/by-id/CACjxUsM4s9=CUmPU4YFOYiD5f=2ULVDBjuFSo20Twe7KbUe8Mw@mail.gmail.com

I dug around in the code, and was rather surprised at how close we already
are to implementing this. The function RI_Initial_Check() already does a
left-join query via SPI to look for any invalid data, so if we could just
replace the near table with the transition table for inserted rows, we'd be
home free. The function SPI_register_trigger_data() makes the transition
tables visible to SPI, so I started to wonder why this hadn't be done
already.

I approached Kevin and Thomas Munro seeking feedback on my approach. I also
made it into a session at the PgConf.ASIA un-conference, and then later
with Michael Paquier at that same conference, and the coalesced feedback
was this:

- the overhead of registering the transition tables probably makes it
unprofitable for single row inserts
- the single row overhead is itself significant, so maybe the transition
tables aren't worse
- there has been talk of replacing transition tables with an in-memory data
structure that would be closer to "free" from a startup perspective and
might even coalesce the transition tables of multiple statements in the
same transaction
- because no declarative code changes, it's trivial to switch from row
level to statement level triggering via pg_upgrade
- assuming that transition tables are an overhead that only pays off when >
N rows have been updated, does it make sense to enforce RI with something
that isn't actually a trigger?
- there was also some mention that parallel query uses a queue mechanism
that might be leveraged to do row-level triggers for updates of <= N rows
and statement level for > N

That's what I have so far. I'm going to be working on a POC patch so that I
can benchmark a pure-statement-level solution, which if nothing else will
let us know the approximate value of N.

All suggestions are appreciated.

Pavel Stehule

pavel.stehule@gmail.com

about 7 years ago

In reply to: Corey Huinker (#1)

Re: Referential Integrity Checks with Statement-level Triggers

po 17. 12. 2018 v 15:32 odesílatel Corey Huinker <corey.huinker@gmail.com>
napsal:

Back when Pg added statement-level triggers, I was interested in the
potential promise of moving referential integrity checks to statement-level
triggers.

The initial conversation, along with Kevin Grittner's POC script (in SQL)
that showed a potential for a 98% reduction in time spent doing RI checks.
The original thread is here:

/messages/by-id/CACjxUsM4s9=CUmPU4YFOYiD5f=2ULVDBjuFSo20Twe7KbUe8Mw@mail.gmail.com

I dug around in the code, and was rather surprised at how close we already
are to implementing this. The function RI_Initial_Check() already does a
left-join query via SPI to look for any invalid data, so if we could just
replace the near table with the transition table for inserted rows, we'd be
home free. The function SPI_register_trigger_data() makes the transition
tables visible to SPI, so I started to wonder why this hadn't be done
already.

I approached Kevin and Thomas Munro seeking feedback on my approach. I
also made it into a session at the PgConf.ASIA un-conference, and then
later with Michael Paquier at that same conference, and the coalesced
feedback was this:

- the overhead of registering the transition tables probably makes it
unprofitable for single row inserts
- the single row overhead is itself significant, so maybe the transition
tables aren't worse
- there has been talk of replacing transition tables with an in-memory
data structure that would be closer to "free" from a startup perspective
and might even coalesce the transition tables of multiple statements in the
same transaction
- because no declarative code changes, it's trivial to switch from row
level to statement level triggering via pg_upgrade
- assuming that transition tables are an overhead that only pays off when

N rows have been updated, does it make sense to enforce RI with something

that isn't actually a trigger?
- there was also some mention that parallel query uses a queue mechanism
that might be leveraged to do row-level triggers for updates of <= N rows
and statement level for > N

That's what I have so far. I'm going to be working on a POC patch so that
I can benchmark a pure-statement-level solution, which if nothing else will
let us know the approximate value of N.

All suggestions are appreciated.

It is great. I though little bit about it - just theoretically.

ROW trigger call RI check too often, and statement trigger too less. I
think so ideal design can be call RI check every 10K rows. I think so can
be unfriendly if somebody does very long import and it fails on the end. I
don't think so there should not be any performance difference, if RI check
is called per 1000 or more rows.

Regards

Pavel

Alvaro Herrera

alvherre@2ndquadrant.com

about 7 years ago

In reply to: Pavel Stehule (#2)

Re: Referential Integrity Checks with Statement-level Triggers

On 2018-Dec-17, Pavel Stehule wrote:

ROW trigger call RI check too often, and statement trigger too less. I
think so ideal design can be call RI check every 10K rows. I think so can
be unfriendly if somebody does very long import and it fails on the end. I
don't think so there should not be any performance difference, if RI check
is called per 1000 or more rows.

This is a good point, but I'm not sure if it's possible to implement
using statement-level triggers. I think the way transition tables work
is that you get the full results at the end of the command; there's no
way to pass control to the RI stuff at arbitrary points during the
execution of the command.

Is there any guidance on the SQL standard about this? I don't think the
timing indicators in the standard (IMMEDIATE, DEFERRED) have any say on
this. Or do they?

Maybe there is a solution for this. I think it's worth thinking about,
even if it's just to say that we won't do it.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Pavel Stehule

pavel.stehule@gmail.com

about 7 years ago

In reply to: Alvaro Herrera (#3)

Re: Referential Integrity Checks with Statement-level Triggers

po 17. 12. 2018 v 18:27 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com>
napsal:

On 2018-Dec-17, Pavel Stehule wrote:

ROW trigger call RI check too often, and statement trigger too less. I
think so ideal design can be call RI check every 10K rows. I think so can
be unfriendly if somebody does very long import and it fails on the end.

I

don't think so there should not be any performance difference, if RI

check

is called per 1000 or more rows.

This is a good point, but I'm not sure if it's possible to implement
using statement-level triggers. I think the way transition tables work
is that you get the full results at the end of the command; there's no
way to pass control to the RI stuff at arbitrary points during the
execution of the command.

It was just a idea. When I think about it, I am not sure, if RI check from
statement trigger is not worse when statement related changes are too big.
On second hand, it is premature optimization maybe. We can check
performance on prototype.

Show quoted text

Is there any guidance on the SQL standard about this? I don't think the
timing indicators in the standard (IMMEDIATE, DEFERRED) have any say on
this. Or do they?

Maybe there is a solution for this. I think it's worth thinking about,
even if it's just to say that we won't do it.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Adam Brusselback

adambrusselback@gmail.com

about 7 years ago

In reply to: Pavel Stehule (#4)

Re: Referential Integrity Checks with Statement-level Triggers

It's something I know I am interested in. For me, I don't really care if my
statement doesn't cancel until the very end if there is a RI violation. The
benefit of not having deletes be slow on tables which have others
referencing it with a fkey which don't have their own index is huge IMO. I
have a good number of those type of logging tables where an index is not
useful 99% of the time, but every once and a while a bulk delete needs to
happen.

It is far from a premature optimization IMO, it is super useful and
something I was hoping would happen ever since I heard about transition
tables being worked on.

Just my $0.02.
-Adam

Pavel Stehule

pavel.stehule@gmail.com

about 7 years ago

In reply to: Adam Brusselback (#5)

Re: Referential Integrity Checks with Statement-level Triggers

po 17. 12. 2018 v 19:19 odesílatel Adam Brusselback <
adambrusselback@gmail.com> napsal:

It's something I know I am interested in. For me, I don't really care if
my statement doesn't cancel until the very end if there is a RI violation.
The benefit of not having deletes be slow on tables which have others
referencing it with a fkey which don't have their own index is huge IMO. I
have a good number of those type of logging tables where an index is not
useful 99% of the time, but every once and a while a bulk delete needs to
happen.

It is far from a premature optimization IMO, it is super useful and
something I was hoping would happen ever since I heard about transition
tables being worked on.

note: my sentence about premature optimization was related to my idea to
divide RI check per 10K rows.

It would be great if RI check will be faster.

Show quoted text

Just my $0.02.
-Adam

Kevin Grittner

kgrittn@gmail.com

about 7 years ago

In reply to: Alvaro Herrera (#3)

Re: Referential Integrity Checks with Statement-level Triggers

On Mon, Dec 17, 2018 at 11:27 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

On 2018-Dec-17, Pavel Stehule wrote:

ROW trigger call RI check too often, and statement trigger too less. I
think so ideal design can be call RI check every 10K rows. I think so can
be unfriendly if somebody does very long import and it fails on the end. I
don't think so there should not be any performance difference, if RI check
is called per 1000 or more rows.

This is a good point, but I'm not sure if it's possible to implement
using statement-level triggers. I think the way transition tables work
is that you get the full results at the end of the command; there's no
way to pass control to the RI stuff at arbitrary points during the
execution of the command.

Is there any guidance on the SQL standard about this? I don't think the
timing indicators in the standard (IMMEDIATE, DEFERRED) have any say on
this. Or do they?

Yes, they do. *ALL* AFTER triggers fire after the statement
completes, it's a question of whether a particular trigger fires once
for the whole statement or once for each row. Observe:

test=# CREATE TABLE t1 (t1id int PRIMARY KEY, t1desc text);
CREATE TABLE
test=# CREATE TABLE t2 (t2id int PRIMARY KEY, t1id int NOT NULL, t2desc text,
test(# FOREIGN KEY (t1id) REFERENCES t1);
CREATE TABLE
test=# CREATE FUNCTION t2_insert_func()
test-# RETURNS TRIGGER
test-# LANGUAGE plpgsql
test-# AS $$
test$# BEGIN
test$# RAISE NOTICE '%', new;
test$# RETURN new;
test$# END;
test$# $$;
CREATE FUNCTION
test=# CREATE TRIGGER t2_insert_trig
test-# BEFORE INSERT ON t2
test-# FOR EACH ROW
test-# EXECUTE FUNCTION t2_insert_func();
CREATE TRIGGER
test=# INSERT INTO t1 VALUES (1), (2), (3);
INSERT 0 3
test=# INSERT INTO t2 VALUES (10, 1), (20, 2), (30, 3), (40, 4), (50, 5);
NOTICE: (10,1,)
NOTICE: (20,2,)
NOTICE: (30,3,)
NOTICE: (40,4,)
NOTICE: (50,5,)
ERROR: insert or update on table "t2" violates foreign key constraint
"t2_t1id_fkey"
DETAIL: Key (t1id)=(4) is not present in table "t1".

All inserts occur before the statement fails, per standard.

Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

Kevin Grittner

kgrittn@gmail.com

about 7 years ago

In reply to: Corey Huinker (#1)

Re: Referential Integrity Checks with Statement-level Triggers

On Mon, Dec 17, 2018 at 8:32 AM Corey Huinker <corey.huinker@gmail.com> wrote:

All suggestions are appreciated.

As I recall, the main issue is how to handle concurrency. The
existing RI triggers do some very special handling of the multi-xact
ID to manage concurrent modifications. Has anyone looked at the
issues with using those techniques with set-oriented statements for
the transition table approach?

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

Emre Hasegeli

emre@hasegeli.com

about 7 years ago

In reply to: Adam Brusselback (#5)

Re: Referential Integrity Checks with Statement-level Triggers

It is far from a premature optimization IMO, it is super useful and something I was hoping would happen ever since I heard about transition tables being worked on.

Me too. Never-ending DELETEs are a common pain point especially for
people migrated from MySQL which creates indexes for foreign keys
automatically.

#10

Corey Huinker

corey.huinker@gmail.com

almost 7 years ago

In reply to: Emre Hasegeli (#9)

3 attachment(s)

Re: Referential Integrity Checks with Statement-level Triggers

Attached is a patch that refactors DELETE triggers to fire at the statement
level.

I chose delete triggers partly out of simplicity, and partly because there
some before/after row linkage in the ON UPDATE CASCADE cases where
statement level triggers might not be feasible as we have currently
implemented them.

After having done the work, I think INSERT triggers would be similarly
straightforward, but wanted to limit scope.

Also, after having stripped the delete cases out of the update-or-delete
functions, it became obvious that the on-update-set-null and
on-update-set-default cases differed by only 3-4 lines, so those functions
were combined.

On a vagrant VM running on my desktop machine, I'm seeing a speed-up of
about 25% in the benchmark provided. I think that figure is cloudy and
below my expectations. Perhaps we'd get a much better picture of whether or
not this is worth it on a bare metal machine, or at least a VM better
suited to benchmarking.

Currently 4 make-check tests are failing. Two of which appear to false
positives (the test makes assumptions about triggers that are no longer
true), and the other two are outside the scope of this benchmark so I'll
revisit them if we go forward.

ri-set-logic.sql is an edited benchmark script adapted from Kevin
Grittner's benchmark that he ran against hand-rolled triggers and posted on
2016-11-02
ri_test.out is a copy paste of two runs of the benchmark script.

Many thanks to everyone who helped, often despite their own objections to
the overall reasoning behind the endeavor. I'm aware that a large
contingent of highly experienced people would very much like to replace our
entire trigger architecture, or at least divorce RI checks from triggers.
Maybe this patch spurs on that change. Even if nothing comes of it, it's
been a great learning experience.

On Sat, Dec 22, 2018 at 11:28 AM Emre Hasegeli <emre@hasegeli.com> wrote:

Show quoted text

It is far from a premature optimization IMO, it is super useful and

something I was hoping would happen ever since I heard about transition
tables being worked on.

Me too. Never-ending DELETEs are a common pain point especially for
people migrated from MySQL which creates indexes for foreign keys
automatically.

Attachments:

0001-Change-Delete-RI-triggers-to-Statement-Level-Trigger.patchtext/x-patch; charset=US-ASCII; name=0001-Change-Delete-RI-triggers-to-Statement-Level-Trigger.patchDownload

From 8a73f9233211076421a565b5c90ecd029b5e6581 Mon Sep 17 00:00:00 2001
From: vagrant <vagrant@pgcoredev>
Date: Wed, 23 Jan 2019 16:59:17 +0000
Subject: [PATCH] Change Delete RI triggers to Statement-Level Triggers

---
 src/backend/commands/tablecmds.c    |   9 +-
 src/backend/commands/trigger.c      |   2 +
 src/backend/utils/adt/ri_triggers.c | 779 ++++++++++++++++++++--------
 3 files changed, 566 insertions(+), 224 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 28a137bb53..21f5bf94a4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8954,6 +8954,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr
 {
 	CreateTrigStmt *fk_trigger;
 
+	TriggerTransition *del = makeNode(TriggerTransition);
+	del->name = "pg_deleted_transition_table";
+	del->isNew = false;
+	del->isTable = true;
+
 	/*
 	 * Build and execute a CREATE CONSTRAINT TRIGGER statement for the ON
 	 * DELETE action on the referenced table.
@@ -8961,11 +8966,11 @@ createForeignKeyActionTriggers(Relation rel, Oid refRelOid, Constraint *fkconstr
 	fk_trigger = makeNode(CreateTrigStmt);
 	fk_trigger->trigname = "RI_ConstraintTrigger_a";
 	fk_trigger->relation = NULL;
-	fk_trigger->row = true;
+	fk_trigger->row = false;
 	fk_trigger->timing = TRIGGER_TYPE_AFTER;
 	fk_trigger->events = TRIGGER_TYPE_DELETE;
 	fk_trigger->columns = NIL;
-	fk_trigger->transitionRels = NIL;
+	fk_trigger->transitionRels = list_make1(del);
 	fk_trigger->whenClause = NULL;
 	fk_trigger->isconstraint = true;
 	fk_trigger->constrrel = NULL;
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 7ffaeaffc6..080587215f 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -510,7 +510,9 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			 *
 			 * Currently this is enforced by the grammar, so just Assert here.
 			 */
+			/*
 			Assert(!stmt->isconstraint);
+			*/
 
 			if (tt->isNew)
 			{
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index e1aa3d0044..6f89ab4c77 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -194,9 +194,10 @@ static int	ri_constraint_cache_valid_count = 0;
 static bool ri_Check_Pk_Match(Relation pk_rel, Relation fk_rel,
 				  HeapTuple old_row,
 				  const RI_ConstraintInfo *riinfo);
-static Datum ri_restrict(TriggerData *trigdata, bool is_no_action);
-static Datum ri_setnull(TriggerData *trigdata);
-static Datum ri_setdefault(TriggerData *trigdata);
+static Datum ri_on_update_restrict(TriggerData *trigdata, bool is_no_action);
+static Datum ri_on_delete_restrict(TriggerData *trigdata, bool is_no_action);
+static Datum ri_on_update_set(TriggerData *trigdata, bool set_null);
+static Datum ri_on_delete_set(TriggerData *trigdata, bool set_null);
 static void quoteOneName(char *buffer, const char *name);
 static void quoteRelationName(char *buffer, Relation rel);
 static void ri_GenerateQual(StringInfo buf,
@@ -603,7 +604,7 @@ RI_FKey_noaction_del(PG_FUNCTION_ARGS)
 	/*
 	 * Share code with RESTRICT/UPDATE cases.
 	 */
-	return ri_restrict((TriggerData *) fcinfo->context, true);
+	return ri_on_delete_restrict((TriggerData *) fcinfo->context, true);
 }
 
 /* ----------
@@ -628,7 +629,7 @@ RI_FKey_restrict_del(PG_FUNCTION_ARGS)
 	/*
 	 * Share code with NO ACTION/UPDATE cases.
 	 */
-	return ri_restrict((TriggerData *) fcinfo->context, false);
+	return ri_on_delete_restrict((TriggerData *) fcinfo->context, false);
 }
 
 /* ----------
@@ -650,7 +651,7 @@ RI_FKey_noaction_upd(PG_FUNCTION_ARGS)
 	/*
 	 * Share code with RESTRICT/DELETE cases.
 	 */
-	return ri_restrict((TriggerData *) fcinfo->context, true);
+	return ri_on_update_restrict((TriggerData *) fcinfo->context, true);
 }
 
 /* ----------
@@ -675,18 +676,249 @@ RI_FKey_restrict_upd(PG_FUNCTION_ARGS)
 	/*
 	 * Share code with NO ACTION/DELETE cases.
 	 */
-	return ri_restrict((TriggerData *) fcinfo->context, false);
+	return ri_on_update_restrict((TriggerData *) fcinfo->context, false);
 }
 
 /* ----------
- * ri_restrict -
+ * ri_on_delete_restrict -
+ *
+ *	Common code for ON DELETE RESTRICT, ON DELETE NO ACTION
+ * ----------
+ */
+static Datum
+ri_on_delete_restrict(TriggerData *trigdata, bool is_no_action)
+{
+	const RI_ConstraintInfo *riinfo;
+	Relation	fk_rel;
+	Relation	pk_rel;
+	SPIPlanPtr	qplan;
+	StringInfoData querybuf;
+	StringInfoData qualbuf;
+	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
+	char		t_attname[MAX_QUOTED_NAME_LEN];
+	char		d_attname[MAX_QUOTED_NAME_LEN];
+	const char *fk_only;
+	Snapshot	test_snapshot = InvalidSnapshot;
+	Snapshot	crosscheck_snapshot = InvalidSnapshot;
+	Oid			save_userid;
+	int			save_sec_context;
+	int			i;
+	int			spi_result;
+	int			save_nestlevel;
+	char		workmembuf[32];
+
+	/*
+	 * Get arguments.
+	 */
+	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
+									trigdata->tg_relation, true);
+
+	/*
+	 * If there are no rows in the transition table, then there is no need
+	 * to do the trigger operation.
+	 */
+	if (tuplestore_tuple_count(trigdata->tg_oldtable) == 0)
+		return PointerGetDatum(NULL);
+
+	/*
+	 * Get the relation descriptors of the FK and PK tables and the old tuple.
+	 *
+	 * fk_rel is opened in RowShareLock mode since that's what our eventual
+	 * SELECT FOR KEY SHARE will get on it.
+	 */
+	fk_rel = heap_open(riinfo->fk_relid, RowShareLock);
+	pk_rel = trigdata->tg_relation;
+
+	switch (riinfo->confmatchtype)
+	{
+			/* ----------
+			 * SQL:2008 15.17 <Execution of referential actions>
+			 *	General rules 9) a) iv):
+			 *		MATCH SIMPLE/FULL
+			 *			... ON DELETE RESTRICT
+			 * ----------
+			 */
+		case FKCONSTR_MATCH_SIMPLE:
+		case FKCONSTR_MATCH_FULL:
+			/*
+			 * This is a delete trigger, so if a PK value was deleted there
+			 * can be no possible replacement PK row
+			 */
+
+			/* ----------
+			 * The query string built is
+			 * SELECT t.fkatt1, [ t.fkatt2 ...]
+			 * FROM <transition_table> d 
+			 * JOIN <fktable> t ON t.fkatt1 = d.fkatt1 [ AND ... ]
+			 * LIMIT 1 FOR KEY SHARE OF t
+			 * ----------
+			 */
+
+			initStringInfo(&querybuf);
+			initStringInfo(&qualbuf);
+
+			fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+				"" : "ONLY ";
+
+			appendStringInfo(&querybuf, "SELECT ");
+			quoteRelationName(fkrelname, fk_rel);
+			appendStringInfo(&qualbuf, " FROM %s%s d JOIN %s t ON ",
+							 fk_only, trigdata->tg_trigger->tgoldtable,
+							 fkrelname);
+			for (i = 0; i < riinfo->nkeys; i++)
+			{
+				if (i > 0)
+				{
+					appendStringInfo(&querybuf, ", ");
+					appendStringInfo(&qualbuf, " AND ");
+				}
+				quoteOneName(t_attname,
+							 RIAttName(fk_rel, riinfo->fk_attnums[i]));
+				quoteOneName(d_attname,
+							 RIAttName(pk_rel, riinfo->pk_attnums[i]));
+				appendStringInfo(&querybuf, "d.%s", d_attname);
+				appendStringInfo(&qualbuf,
+								 "t.%s = d.%s", t_attname, d_attname);
+			}
+			appendStringInfoString(&querybuf, qualbuf.data);
+			appendStringInfoString(&querybuf,
+								   " LIMIT 1");
+								   //" LIMIT 1 FOR KEY SHARE OF t");
+
+			/* Switch to proper UID to perform check as */
+			GetUserIdAndSecContext(&save_userid, &save_sec_context);
+			SetUserIdAndSecContext(RelationGetForm(fk_rel)->relowner,
+								   save_sec_context | 
+								   SECURITY_LOCAL_USERID_CHANGE |
+								   SECURITY_NOFORCE_RLS);
+
+			save_nestlevel = NewGUCNestLevel();
+			snprintf(workmembuf, sizeof(workmembuf), "%d", maintenance_work_mem);
+			(void) set_config_option("work_mem", workmembuf,
+									 PGC_USERSET, PGC_S_SESSION,
+									 GUC_ACTION_SAVE, true, 0, false);
+			if (SPI_connect() != SPI_OK_CONNECT)
+				elog(ERROR, "SPI_connect failed");
+			if (SPI_register_trigger_data(trigdata) != SPI_OK_TD_REGISTER)
+				elog(ERROR, "SPI_register_trigger_data failed");
+			qplan = SPI_prepare(querybuf.data, 0, NULL);
+			if (qplan == NULL)
+				elog(ERROR, "SPI_prepare returned %s for %s",
+					 SPI_result_code_string(SPI_result), querybuf.data);
+
+			/*
+			 * In READ COMMITTED mode, we just need to use an up-to-date regular
+			 * snapshot, and we will see all rows that could be interesting. But in
+			 * transaction-snapshot mode, we can't change the transaction snapshot. If
+			 * the caller passes detectNewRows == false then it's okay to do the query
+			 * with the transaction snapshot; otherwise we use a current snapshot, and
+			 * tell the executor to error out if it finds any rows under the current
+			 * snapshot that wouldn't be visible per the transaction snapshot.  Note
+			 * that SPI_execute_snapshot will register the snapshots, so we don't need
+			 * to bother here.
+			 */
+			if (IsolationUsesXactSnapshot())
+			{
+				CommandCounterIncrement();	/* be sure all my own work is visible */
+				test_snapshot = GetLatestSnapshot();
+				crosscheck_snapshot = GetTransactionSnapshot();
+			}
+			else
+			{
+				/* the default SPI behavior is okay */
+				test_snapshot = InvalidSnapshot;
+				crosscheck_snapshot = InvalidSnapshot;
+			}
+
+			/* Finally we can run the query. */
+			spi_result = SPI_execute_snapshot(qplan,
+											  NULL, NULL,
+											  test_snapshot, crosscheck_snapshot,
+											  false, false, 1);
+
+			/* Restore UID and security context */
+			SetUserIdAndSecContext(save_userid, save_sec_context);
+
+			/* Check result */
+			if (spi_result < 0)
+				elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
+
+			if (SPI_processed > 0)
+			{
+				TupleDesc tupdesc = SPI_tuptable->tupdesc;
+				HeapTuple tuple = SPI_tuptable->vals[0];
+				StringInfoData key_names;
+				StringInfoData key_values;
+				int			i;
+				char	   *name,
+						   *val;
+
+				initStringInfo(&key_names);
+				initStringInfo(&key_values);
+				for (i = 1; i <= tupdesc->natts; i++)
+				{
+					if (i > 1)
+					{
+						appendStringInfo(&key_names, ", ");
+						appendStringInfo(&key_values, ", ");
+					}
+					name = SPI_fname(tupdesc, i);
+					val = SPI_getvalue(tuple, tupdesc, i);
+					if (!val)
+						val = "NULL";
+					appendStringInfo(&key_names, "%s", name);
+					appendStringInfo(&key_values, "%s", val);
+				}
+				/* TODO: Fix error message to reflect that this is only
+				   ever a delete */
+				ereport(ERROR,
+						(errcode(ERRCODE_FOREIGN_KEY_VIOLATION),
+						 errmsg("update or delete on table \"%s\" violates foreign key constraint \"%s\" on table \"%s\"",
+								RelationGetRelationName(pk_rel),
+								NameStr(riinfo->conname),
+								RelationGetRelationName(fk_rel)),
+						 errdetail("Key (%s)=(%s) is still referenced from table \"%s\".",
+								   key_names.data, key_values.data,
+								   RelationGetRelationName(fk_rel)),
+						 errtableconstraint(fk_rel, NameStr(riinfo->conname))));
+			}
+
+			if (SPI_finish() != SPI_OK_FINISH)
+				elog(ERROR, "SPI_finish failed");
+			AtEOXact_GUC(true, save_nestlevel);
+
+			heap_close(fk_rel, RowShareLock);
+
+			return PointerGetDatum(NULL);
+
+			/*
+			 * Handle MATCH PARTIAL restrict delete
+			 */
+		case FKCONSTR_MATCH_PARTIAL:
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("MATCH PARTIAL not yet implemented")));
+			return PointerGetDatum(NULL);
+
+		default:
+			elog(ERROR, "unrecognized confmatchtype: %d",
+				 riinfo->confmatchtype);
+			break;
+	}
+
+	/* Never reached */
+	return PointerGetDatum(NULL);
+}
+
+/* ----------
+ * ri_on_update_restrict -
  *
  *	Common code for ON DELETE RESTRICT, ON DELETE NO ACTION,
  *	ON UPDATE RESTRICT, and ON UPDATE NO ACTION.
  * ----------
  */
 static Datum
-ri_restrict(TriggerData *trigdata, bool is_no_action)
+ri_on_update_restrict(TriggerData *trigdata, bool is_no_action)
 {
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
@@ -813,7 +1045,7 @@ ri_restrict(TriggerData *trigdata, bool is_no_action)
 			return PointerGetDatum(NULL);
 
 			/*
-			 * Handle MATCH PARTIAL restrict delete or update.
+			 * Handle MATCH PARTIAL restrict update.
 			 */
 		case FKCONSTR_MATCH_PARTIAL:
 			ereport(ERROR,
@@ -845,10 +1077,20 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
-	HeapTuple	old_row;
-	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
 	int			i;
+	Snapshot	test_snapshot;
+	Snapshot	crosscheck_snapshot;
+	int			spi_result;
+	Oid			save_userid;
+	int			save_sec_context;
+	StringInfoData querybuf;
+	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
+	char		t_attname[MAX_QUOTED_NAME_LEN];
+	char		d_attname[MAX_QUOTED_NAME_LEN];
+	const char *fk_only;
+	int			save_nestlevel;
+	char		workmembuf[32];
 
 	/*
 	 * Check that this is a valid trigger call on the right time and event.
@@ -862,14 +1104,21 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 									trigdata->tg_relation, true);
 
 	/*
-	 * Get the relation descriptors of the FK and PK tables and the old tuple.
+	 * If there are no rows in the transition table, then there is no need
+	 * to do the trigger operation.
+	 */
+	if (tuplestore_tuple_count(trigdata->tg_oldtable) == 0)
+		return PointerGetDatum(NULL);
+
+	/*
+	 * Get the relation descriptors of the FK and PK tables and the
+	 * transition_table
 	 *
 	 * fk_rel is opened in RowExclusiveLock mode since that's what our
 	 * eventual DELETE will get on it.
 	 */
 	fk_rel = table_open(riinfo->fk_relid, RowExclusiveLock);
 	pk_rel = trigdata->tg_relation;
-	old_row = trigdata->tg_trigtuple;
 
 	switch (riinfo->confmatchtype)
 	{
@@ -882,71 +1131,107 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
 			 */
 		case FKCONSTR_MATCH_SIMPLE:
 		case FKCONSTR_MATCH_FULL:
+			/* ----------
+			 * The query string built is
+			 *  DELETE FROM [ONLY] <fktable> a
+			 *  USING <transition-table> t
+			 *  WHERE t.fkatt1 = a.fkatt1 [AND ...]
+			 * ----------
+			 */
+			initStringInfo(&querybuf);
+			fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+				"" : "ONLY ";
+			quoteRelationName(fkrelname, fk_rel);
+			appendStringInfo(&querybuf,
+							 "DELETE FROM %s%s t USING %s d WHERE ",
+							 fk_only, fkrelname, 
+							 trigdata->tg_trigger->tgoldtable);
+			for (i = 0; i < riinfo->nkeys; i++)
+			{
+
+				if (i > 0)
+					appendStringInfo(&querybuf," AND ");
+
+				quoteOneName(t_attname,
+							 RIAttName(fk_rel, riinfo->fk_attnums[i]));
+				quoteOneName(d_attname,
+							 RIAttName(pk_rel, riinfo->pk_attnums[i]));
+				appendStringInfo(&querybuf,
+								 "t.%s = d.%s", t_attname, d_attname);
+			}
+
+			/* Switch to proper UID to perform check as */
+			GetUserIdAndSecContext(&save_userid, &save_sec_context);
+			SetUserIdAndSecContext(RelationGetForm(fk_rel)->relowner,
+								   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+								   SECURITY_NOFORCE_RLS);
+
+			save_nestlevel = NewGUCNestLevel();
+			snprintf(workmembuf, sizeof(workmembuf), "%d", maintenance_work_mem);
+			(void) set_config_option("work_mem", workmembuf,
+									 PGC_USERSET, PGC_S_SESSION,
+									 GUC_ACTION_SAVE, true, 0, false);
 			if (SPI_connect() != SPI_OK_CONNECT)
 				elog(ERROR, "SPI_connect failed");
+			if (SPI_register_trigger_data(trigdata) != SPI_OK_TD_REGISTER)
+				elog(ERROR, "SPI_register_trigger_data failed");
+
+			/* Create the plan */
+			qplan = SPI_prepare(querybuf.data, 0, NULL);
+
+			if (qplan == NULL)
+				elog(ERROR, "SPI_prepare returned %s for %s",
+					 SPI_result_code_string(SPI_result), querybuf.data);
 
 			/*
-			 * Fetch or prepare a saved plan for the cascaded delete
+			 * In READ COMMITTED mode, we just need to use an up-to-date regular
+			 * snapshot, and we will see all rows that could be interesting. But in
+			 * transaction-snapshot mode, we can't change the transaction snapshot. If
+			 * the caller passes detectNewRows == false then it's okay to do the query
+			 * with the transaction snapshot; otherwise we use a current snapshot, and
+			 * tell the executor to error out if it finds any rows under the current
+			 * snapshot that wouldn't be visible per the transaction snapshot.  Note
+			 * that SPI_execute_snapshot will register the snapshots, so we don't need
+			 * to bother here.
 			 */
-			ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_CASCADE_DEL_DODELETE);
-
-			if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+			if (IsolationUsesXactSnapshot())
 			{
-				StringInfoData querybuf;
-				char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-				char		attname[MAX_QUOTED_NAME_LEN];
-				char		paramname[16];
-				const char *querysep;
-				Oid			queryoids[RI_MAX_NUMKEYS];
-				const char *fk_only;
+				CommandCounterIncrement();	/* be sure all my own work is visible */
+				test_snapshot = GetLatestSnapshot();
+				crosscheck_snapshot = GetTransactionSnapshot();
+			}
+			else
+			{
+				/* the default SPI behavior is okay */
+				test_snapshot = InvalidSnapshot;
+				crosscheck_snapshot = InvalidSnapshot;
+			}
 
-				/* ----------
-				 * The query string built is
-				 *	DELETE FROM [ONLY] <fktable> WHERE $1 = fkatt1 [AND ...]
-				 * The type id's for the $ parameters are those of the
-				 * corresponding PK attributes.
-				 * ----------
-				 */
-				initStringInfo(&querybuf);
-				fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-					"" : "ONLY ";
-				quoteRelationName(fkrelname, fk_rel);
-				appendStringInfo(&querybuf, "DELETE FROM %s%s",
-								 fk_only, fkrelname);
-				querysep = "WHERE";
-				for (i = 0; i < riinfo->nkeys; i++)
-				{
-					Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-					Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+			/* Finally we can run the query. */
+			spi_result = SPI_execute_snapshot(qplan,
+											  NULL, NULL,
+											  test_snapshot, crosscheck_snapshot,
+											  false, false, 0);
 
-					quoteOneName(attname,
-								 RIAttName(fk_rel, riinfo->fk_attnums[i]));
-					sprintf(paramname, "$%d", i + 1);
-					ri_GenerateQual(&querybuf, querysep,
-									paramname, pk_type,
-									riinfo->pf_eq_oprs[i],
-									attname, fk_type);
-					querysep = "AND";
-					queryoids[i] = pk_type;
-				}
+			/* Restore UID and security context */
+			SetUserIdAndSecContext(save_userid, save_sec_context);
 
-				/* Prepare and save the plan */
-				qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-									 &qkey, fk_rel, pk_rel, true);
-			}
+			/* Check result */
+			if (spi_result < 0)
+				elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
 
-			/*
-			 * We have a plan now. Build up the arguments from the key values
-			 * in the deleted PK tuple and delete the referencing rows
-			 */
-			ri_PerformCheck(riinfo, &qkey, qplan,
-							fk_rel, pk_rel,
-							old_row, NULL,
-							true,	/* must detect new rows */
-							SPI_OK_DELETE);
+			if (spi_result != SPI_OK_DELETE)
+				ereport(ERROR,
+						(errcode(ERRCODE_INTERNAL_ERROR),
+						 errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
+								RelationGetRelationName(pk_rel),
+								NameStr(riinfo->conname),
+								RelationGetRelationName(fk_rel)),
+						 errhint("This is most likely due to a rule having rewritten the query.")));
 
 			if (SPI_finish() != SPI_OK_FINISH)
 				elog(ERROR, "SPI_finish failed");
+			AtEOXact_GUC(true, save_nestlevel);
 
 			table_close(fk_rel, RowExclusiveLock);
 
@@ -1141,49 +1426,34 @@ RI_FKey_setnull_del(PG_FUNCTION_ARGS)
 	 * Check that this is a valid trigger call on the right time and event.
 	 */
 	ri_CheckTrigger(fcinfo, "RI_FKey_setnull_del", RI_TRIGTYPE_DELETE);
-
-	/*
-	 * Share code with UPDATE case
-	 */
-	return ri_setnull((TriggerData *) fcinfo->context);
-}
-
-/* ----------
- * RI_FKey_setnull_upd -
- *
- *	Set foreign key references to NULL at update event on PK table.
- * ----------
- */
-Datum
-RI_FKey_setnull_upd(PG_FUNCTION_ARGS)
-{
 	/*
-	 * Check that this is a valid trigger call on the right time and event.
+	 * Share code with DELETE SET DEFAULT case
 	 */
-	ri_CheckTrigger(fcinfo, "RI_FKey_setnull_upd", RI_TRIGTYPE_UPDATE);
-
-	/*
-	 * Share code with DELETE case
-	 */
-	return ri_setnull((TriggerData *) fcinfo->context);
+	return ri_on_delete_set((TriggerData *) fcinfo->context, true); 
 }
 
-/* ----------
- * ri_setnull -
- *
- *	Common code for ON DELETE SET NULL and ON UPDATE SET NULL
- * ----------
- */
 static Datum
-ri_setnull(TriggerData *trigdata)
+ri_on_delete_set(TriggerData *trigdata, bool set_null)
 {
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
 	Relation	pk_rel;
-	HeapTuple	old_row;
-	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
+	StringInfoData querybuf;
+	StringInfoData qualbuf;
+	char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
+	char		t_attname[MAX_QUOTED_NAME_LEN];
+	char		d_attname[MAX_QUOTED_NAME_LEN];
+	const char *fk_only;
+	const char *set_to = set_null ? "NULL" : "DEFAULT";
 	int			i;
+	Oid			save_userid;
+	int			save_sec_context;
+	Snapshot	test_snapshot;
+	Snapshot	crosscheck_snapshot;
+	int			spi_result;
+	int			save_nestlevel;
+	char		workmembuf[32];
 
 	/*
 	 * Get arguments.
@@ -1191,15 +1461,21 @@ ri_setnull(TriggerData *trigdata)
 	riinfo = ri_FetchConstraintInfo(trigdata->tg_trigger,
 									trigdata->tg_relation, true);
 
+	/*
+	 * If there are no rows in the transition table, then there is no need
+	 * to do the trigger operation.
+	 */
+	if (tuplestore_tuple_count(trigdata->tg_oldtable) == 0)
+		return PointerGetDatum(NULL);
+
 	/*
 	 * Get the relation descriptors of the FK and PK tables and the old tuple.
 	 *
 	 * fk_rel is opened in RowExclusiveLock mode since that's what our
 	 * eventual UPDATE will get on it.
 	 */
-	fk_rel = table_open(riinfo->fk_relid, RowExclusiveLock);
+	fk_rel = heap_open(riinfo->fk_relid, RowExclusiveLock);
 	pk_rel = trigdata->tg_relation;
-	old_row = trigdata->tg_trigtuple;
 
 	switch (riinfo->confmatchtype)
 	{
@@ -1207,96 +1483,122 @@ ri_setnull(TriggerData *trigdata)
 			 * SQL:2008 15.17 <Execution of referential actions>
 			 *	General rules 9) a) ii):
 			 *		MATCH SIMPLE/FULL
-			 *			... ON DELETE SET NULL
-			 *	General rules 10) a) ii):
-			 *		MATCH SIMPLE/FULL
-			 *			... ON UPDATE SET NULL
+			 *			... ON DELETE SET (NULL|DEFAULT)
 			 * ----------
 			 */
 		case FKCONSTR_MATCH_SIMPLE:
 		case FKCONSTR_MATCH_FULL:
+			/* ----------
+			 * The query string built is
+			 *	UPDATE [ONLY] <fktable> AS t
+			 *  SET fkatt1 = (NULL|DEFAULT) [, ...]
+			 *  FROM <transition-table> AS d 
+			 *  WHERE t.fkatt1 = d.fkatt1 [AND ...]
+			 * ----------
+			 */
+			initStringInfo(&querybuf);
+			initStringInfo(&qualbuf);
+			fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
+				"" : "ONLY ";
+			quoteRelationName(fkrelname, fk_rel);
+			appendStringInfo(&querybuf, "UPDATE %s%s AS t SET ",
+							 fk_only, fkrelname);
+			appendStringInfo(&qualbuf, " FROM %s AS d WHERE ",
+							 trigdata->tg_trigger->tgoldtable);
+			for (i = 0; i < riinfo->nkeys; i++)
+			{
+				if (i > 0)
+				{
+					appendStringInfo(&querybuf, ", ");
+					appendStringInfo(&qualbuf, " AND ");
+				}
+				quoteOneName(t_attname,
+							 RIAttName(fk_rel, riinfo->fk_attnums[i]));
+				quoteOneName(d_attname,
+							 RIAttName(pk_rel, riinfo->pk_attnums[i]));
+				appendStringInfo(&querybuf, "%s = %s",
+								 t_attname, set_to);
+				appendStringInfo(&qualbuf, "t.%s = d.%s",
+								 t_attname, d_attname);
+			}
+			appendStringInfoString(&querybuf, qualbuf.data);
+
+			/* Switch to proper UID to perform check as */
+			GetUserIdAndSecContext(&save_userid, &save_sec_context);
+			SetUserIdAndSecContext(RelationGetForm(fk_rel)->relowner,
+								   save_sec_context | SECURITY_LOCAL_USERID_CHANGE |
+								   SECURITY_NOFORCE_RLS);
+
+			save_nestlevel = NewGUCNestLevel();
+			snprintf(workmembuf, sizeof(workmembuf), "%d", maintenance_work_mem);
+			(void) set_config_option("work_mem", workmembuf,
+									 PGC_USERSET, PGC_S_SESSION,
+									 GUC_ACTION_SAVE, true, 0, false);
 			if (SPI_connect() != SPI_OK_CONNECT)
 				elog(ERROR, "SPI_connect failed");
+			if (SPI_register_trigger_data(trigdata) != SPI_OK_TD_REGISTER)
+				elog(ERROR, "SPI_register_trigger_data failed");
+			qplan = SPI_prepare(querybuf.data, 0, NULL);
+			if (qplan == NULL)
+				elog(ERROR, "SPI_prepare returned %s for %s",
+					 SPI_result_code_string(SPI_result), querybuf.data);
 
 			/*
-			 * Fetch or prepare a saved plan for the set null operation (it's
-			 * the same query for delete and update cases)
+			 * In READ COMMITTED mode, we just need to use an up-to-date regular
+			 * snapshot, and we will see all rows that could be interesting. But in
+			 * transaction-snapshot mode, we can't change the transaction snapshot. If
+			 * the caller passes detectNewRows == false then it's okay to do the query
+			 * with the transaction snapshot; otherwise we use a current snapshot, and
+			 * tell the executor to error out if it finds any rows under the current
+			 * snapshot that wouldn't be visible per the transaction snapshot.  Note
+			 * that SPI_execute_snapshot will register the snapshots, so we don't need
+			 * to bother here.
 			 */
-			ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_SETNULL_DOUPDATE);
-
-			if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
+			if (IsolationUsesXactSnapshot())
 			{
-				StringInfoData querybuf;
-				StringInfoData qualbuf;
-				char		fkrelname[MAX_QUOTED_REL_NAME_LEN];
-				char		attname[MAX_QUOTED_NAME_LEN];
-				char		paramname[16];
-				const char *querysep;
-				const char *qualsep;
-				const char *fk_only;
-				Oid			queryoids[RI_MAX_NUMKEYS];
+				CommandCounterIncrement();	/* be sure all my own work is visible */
+				test_snapshot = GetLatestSnapshot();
+				crosscheck_snapshot = GetTransactionSnapshot();
+			}
+			else
+			{
+				/* the default SPI behavior is okay */
+				test_snapshot = InvalidSnapshot;
+				crosscheck_snapshot = InvalidSnapshot;
+			}
 
-				/* ----------
-				 * The query string built is
-				 *	UPDATE [ONLY] <fktable> SET fkatt1 = NULL [, ...]
-				 *			WHERE $1 = fkatt1 [AND ...]
-				 * The type id's for the $ parameters are those of the
-				 * corresponding PK attributes.
-				 * ----------
-				 */
-				initStringInfo(&querybuf);
-				initStringInfo(&qualbuf);
-				fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
-					"" : "ONLY ";
-				quoteRelationName(fkrelname, fk_rel);
-				appendStringInfo(&querybuf, "UPDATE %s%s SET",
-								 fk_only, fkrelname);
-				querysep = "";
-				qualsep = "WHERE";
-				for (i = 0; i < riinfo->nkeys; i++)
-				{
-					Oid			pk_type = RIAttType(pk_rel, riinfo->pk_attnums[i]);
-					Oid			fk_type = RIAttType(fk_rel, riinfo->fk_attnums[i]);
+			/* Finally we can run the query. */
+			spi_result = SPI_execute_snapshot(qplan,
+											  NULL, NULL,
+											  test_snapshot, crosscheck_snapshot,
+											  false, false, 1);
 
-					quoteOneName(attname,
-								 RIAttName(fk_rel, riinfo->fk_attnums[i]));
-					appendStringInfo(&querybuf,
-									 "%s %s = NULL",
-									 querysep, attname);
-					sprintf(paramname, "$%d", i + 1);
-					ri_GenerateQual(&qualbuf, qualsep,
-									paramname, pk_type,
-									riinfo->pf_eq_oprs[i],
-									attname, fk_type);
-					querysep = ",";
-					qualsep = "AND";
-					queryoids[i] = pk_type;
-				}
-				appendStringInfoString(&querybuf, qualbuf.data);
+			/* Restore UID and security context */
+			SetUserIdAndSecContext(save_userid, save_sec_context);
 
-				/* Prepare and save the plan */
-				qplan = ri_PlanCheck(querybuf.data, riinfo->nkeys, queryoids,
-									 &qkey, fk_rel, pk_rel, true);
-			}
+			/* Check result */
+			if (spi_result < 0)
+				elog(ERROR, "SPI_execute_snapshot returned %s", SPI_result_code_string(spi_result));
 
-			/*
-			 * We have a plan now. Run it to update the existing references.
-			 */
-			ri_PerformCheck(riinfo, &qkey, qplan,
-							fk_rel, pk_rel,
-							old_row, NULL,
-							true,	/* must detect new rows */
-							SPI_OK_UPDATE);
+			if (spi_result != SPI_OK_UPDATE)
+				ereport(ERROR,
+						(errcode(ERRCODE_INTERNAL_ERROR),
+						 errmsg("referential integrity query on \"%s\" from constraint \"%s\" on \"%s\" gave unexpected result",
+								RelationGetRelationName(pk_rel),
+								NameStr(riinfo->conname),
+								RelationGetRelationName(fk_rel)),
+						 errhint("This is most likely due to a rule having rewritten the query.")));
 
 			if (SPI_finish() != SPI_OK_FINISH)
 				elog(ERROR, "SPI_finish failed");
+			AtEOXact_GUC(true, save_nestlevel);
 
-			table_close(fk_rel, RowExclusiveLock);
+			heap_close(fk_rel, RowExclusiveLock);
 
 			return PointerGetDatum(NULL);
 
 			/*
-			 * Handle MATCH PARTIAL set null delete or update.
+			 * Handle MATCH PARTIAL set null delete
 			 */
 		case FKCONSTR_MATCH_PARTIAL:
 			ereport(ERROR,
@@ -1316,53 +1618,30 @@ ri_setnull(TriggerData *trigdata)
 
 
 /* ----------
- * RI_FKey_setdefault_del -
- *
- *	Set foreign key references to defaults at delete event on PK table.
- * ----------
- */
-Datum
-RI_FKey_setdefault_del(PG_FUNCTION_ARGS)
-{
-	/*
-	 * Check that this is a valid trigger call on the right time and event.
-	 */
-	ri_CheckTrigger(fcinfo, "RI_FKey_setdefault_del", RI_TRIGTYPE_DELETE);
-
-	/*
-	 * Share code with UPDATE case
-	 */
-	return ri_setdefault((TriggerData *) fcinfo->context);
-}
-
-/* ----------
- * RI_FKey_setdefault_upd -
+ * RI_FKey_setnull_upd -
  *
- *	Set foreign key references to defaults at update event on PK table.
+ *	Set foreign key references to NULL at update event on PK table.
  * ----------
  */
 Datum
-RI_FKey_setdefault_upd(PG_FUNCTION_ARGS)
+RI_FKey_setnull_upd(PG_FUNCTION_ARGS)
 {
 	/*
 	 * Check that this is a valid trigger call on the right time and event.
 	 */
-	ri_CheckTrigger(fcinfo, "RI_FKey_setdefault_upd", RI_TRIGTYPE_UPDATE);
+	ri_CheckTrigger(fcinfo, "RI_FKey_setnull_upd", RI_TRIGTYPE_UPDATE);
 
-	/*
-	 * Share code with DELETE case
-	 */
-	return ri_setdefault((TriggerData *) fcinfo->context);
+	return ri_on_update_set((TriggerData *) fcinfo->context, true);
 }
 
 /* ----------
- * ri_setdefault -
+ * ri_on_update_set -
  *
- *	Common code for ON DELETE SET DEFAULT and ON UPDATE SET DEFAULT
+ *	Common code for ON UPDATE SET NULL / ON UPDATE SET DEFAULT
  * ----------
  */
 static Datum
-ri_setdefault(TriggerData *trigdata)
+ri_on_update_set(TriggerData *trigdata, bool set_null)
 {
 	const RI_ConstraintInfo *riinfo;
 	Relation	fk_rel;
@@ -1370,6 +1649,10 @@ ri_setdefault(TriggerData *trigdata)
 	HeapTuple	old_row;
 	RI_QueryKey qkey;
 	SPIPlanPtr	qplan;
+	int			i;
+	const char *set_to = set_null ? "NULL" : "DEFAULT";
+	int32 set_type = set_null ? RI_PLAN_SETNULL_DOUPDATE :
+							RI_PLAN_SETDEFAULT_DOUPDATE;
 
 	/*
 	 * Get arguments.
@@ -1391,12 +1674,9 @@ ri_setdefault(TriggerData *trigdata)
 	{
 			/* ----------
 			 * SQL:2008 15.17 <Execution of referential actions>
-			 *	General rules 9) a) iii):
-			 *		MATCH SIMPLE/FULL
-			 *			... ON DELETE SET DEFAULT
-			 *	General rules 10) a) iii):
+			 *	General rules 10) a) ii):
 			 *		MATCH SIMPLE/FULL
-			 *			... ON UPDATE SET DEFAULT
+			 *			... ON UPDATE SET (NULL|DEFAULT)
 			 * ----------
 			 */
 		case FKCONSTR_MATCH_SIMPLE:
@@ -1405,10 +1685,10 @@ ri_setdefault(TriggerData *trigdata)
 				elog(ERROR, "SPI_connect failed");
 
 			/*
-			 * Fetch or prepare a saved plan for the set default operation
-			 * (it's the same query for delete and update cases)
+			 * Fetch or prepare a saved plan for the set null operation (it's
+			 * the same query for delete and update cases)
 			 */
-			ri_BuildQueryKey(&qkey, riinfo, RI_PLAN_SETDEFAULT_DOUPDATE);
+			ri_BuildQueryKey(&qkey, riinfo, set_type);
 
 			if ((qplan = ri_FetchPreparedPlan(&qkey)) == NULL)
 			{
@@ -1419,13 +1699,12 @@ ri_setdefault(TriggerData *trigdata)
 				char		paramname[16];
 				const char *querysep;
 				const char *qualsep;
-				Oid			queryoids[RI_MAX_NUMKEYS];
 				const char *fk_only;
-				int			i;
+				Oid			queryoids[RI_MAX_NUMKEYS];
 
 				/* ----------
 				 * The query string built is
-				 *	UPDATE [ONLY] <fktable> SET fkatt1 = DEFAULT [, ...]
+				 *	UPDATE [ONLY] <fktable> SET fkatt1 = (NULL|DEFULT) [, ...]
 				 *			WHERE $1 = fkatt1 [AND ...]
 				 * The type id's for the $ parameters are those of the
 				 * corresponding PK attributes.
@@ -1433,9 +1712,9 @@ ri_setdefault(TriggerData *trigdata)
 				 */
 				initStringInfo(&querybuf);
 				initStringInfo(&qualbuf);
-				quoteRelationName(fkrelname, fk_rel);
 				fk_only = fk_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ?
 					"" : "ONLY ";
+				quoteRelationName(fkrelname, fk_rel);
 				appendStringInfo(&querybuf, "UPDATE %s%s SET",
 								 fk_only, fkrelname);
 				querysep = "";
@@ -1448,8 +1727,8 @@ ri_setdefault(TriggerData *trigdata)
 					quoteOneName(attname,
 								 RIAttName(fk_rel, riinfo->fk_attnums[i]));
 					appendStringInfo(&querybuf,
-									 "%s %s = DEFAULT",
-									 querysep, attname);
+									 "%s %s = %s",
+									 querysep, attname, set_to);
 					sprintf(paramname, "$%d", i + 1);
 					ri_GenerateQual(&qualbuf, qualsep,
 									paramname, pk_type,
@@ -1494,10 +1773,12 @@ ri_setdefault(TriggerData *trigdata)
 			 * of an UPDATE, while SET NULL is certain to result in rows that
 			 * satisfy the FK constraint.)
 			 */
-			return ri_restrict(trigdata, true);
+			if (!set_null)
+				return ri_on_update_restrict(trigdata, true);
+			return PointerGetDatum(NULL);
 
 			/*
-			 * Handle MATCH PARTIAL set default delete or update.
+			 * Handle MATCH PARTIAL set (NULL|DEFAULT) update.
 			 */
 		case FKCONSTR_MATCH_PARTIAL:
 			ereport(ERROR,
@@ -1516,6 +1797,46 @@ ri_setdefault(TriggerData *trigdata)
 }
 
 
+/* ----------
+ * RI_FKey_setdefault_del -
+ *
+ *	Set foreign key references to defaults at delete event on PK table.
+ * ----------
+ */
+Datum
+RI_FKey_setdefault_del(PG_FUNCTION_ARGS)
+{
+	/*
+	 * Check that this is a valid trigger call on the right time and event.
+	 */
+	ri_CheckTrigger(fcinfo, "RI_FKey_setdefault_del", RI_TRIGTYPE_DELETE);
+
+	/*
+	 * Share code with DELETE SET NULL case
+	 */
+	return ri_on_delete_set((TriggerData *) fcinfo->context, false); 
+}
+
+/* ----------
+ * RI_FKey_setdefault_upd -
+ *
+ *	Set foreign key references to defaults at update event on PK table.
+ * ----------
+ */
+Datum
+RI_FKey_setdefault_upd(PG_FUNCTION_ARGS)
+{
+	/*
+	 * Check that this is a valid trigger call on the right time and event.
+	 */
+	ri_CheckTrigger(fcinfo, "RI_FKey_setdefault_upd", RI_TRIGTYPE_UPDATE);
+
+	/*
+	 * Share code with DELETE case
+	 */
+	return ri_on_update_set((TriggerData *) fcinfo->context, false);
+}
+
 /* ----------
  * RI_FKey_pk_upd_check_required -
  *
@@ -1534,6 +1855,12 @@ RI_FKey_pk_upd_check_required(Trigger *trigger, Relation pk_rel,
 {
 	const RI_ConstraintInfo *riinfo;
 
+	/*
+	 * Statement level triggers must always fire
+	 */
+	if ((old_row == NULL) && (new_row == NULL))
+		return true;
+
 	/*
 	 * Get arguments.
 	 */
@@ -1543,7 +1870,6 @@ RI_FKey_pk_upd_check_required(Trigger *trigger, Relation pk_rel,
 	{
 		case FKCONSTR_MATCH_SIMPLE:
 		case FKCONSTR_MATCH_FULL:
-
 			/*
 			 * If any old key value is NULL, the row could not have been
 			 * referenced by an FK row, so no check is needed.
@@ -2113,27 +2439,36 @@ ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname, int tgkind)
 	/*
 	 * Check proper event
 	 */
-	if (!TRIGGER_FIRED_AFTER(trigdata->tg_event) ||
-		!TRIGGER_FIRED_FOR_ROW(trigdata->tg_event))
-		ereport(ERROR,
-				(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
-				 errmsg("function \"%s\" must be fired AFTER ROW", funcname)));
-
 	switch (tgkind)
 	{
 		case RI_TRIGTYPE_INSERT:
+			if (!TRIGGER_FIRED_AFTER(trigdata->tg_event) ||
+				!TRIGGER_FIRED_FOR_ROW(trigdata->tg_event))
+				ereport(ERROR,
+						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+						 errmsg("function \"%s\" must be fired AFTER ROW", funcname)));
 			if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event))
 				ereport(ERROR,
 						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
 						 errmsg("function \"%s\" must be fired for INSERT", funcname)));
 			break;
 		case RI_TRIGTYPE_UPDATE:
+			if (!TRIGGER_FIRED_AFTER(trigdata->tg_event) ||
+				!TRIGGER_FIRED_FOR_ROW(trigdata->tg_event))
+				ereport(ERROR,
+						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+						 errmsg("function \"%s\" must be fired AFTER ROW", funcname)));
 			if (!TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))
 				ereport(ERROR,
 						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
 						 errmsg("function \"%s\" must be fired for UPDATE", funcname)));
 			break;
 		case RI_TRIGTYPE_DELETE:
+			if (!TRIGGER_FIRED_AFTER(trigdata->tg_event) ||
+				!TRIGGER_FIRED_FOR_STATEMENT(trigdata->tg_event))
+				ereport(ERROR,
+						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
+						 errmsg("function \"%s\" must be fired AFTER STATEMENT", funcname)));
 			if (!TRIGGER_FIRED_BY_DELETE(trigdata->tg_event))
 				ereport(ERROR,
 						(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
-- 
2.17.1

ri-set-logic.sqlapplication/sql; name=ri-set-logic.sqlDownload

ri_test.outapplication/octet-stream; name=ri_test.outDownload

#11

Antonin Houska

ah@cybertec.at

almost 7 years ago

In reply to: Corey Huinker (#10)

Re: Referential Integrity Checks with Statement-level Triggers

Corey Huinker <corey.huinker@gmail.com> wrote:

Attached is a patch that refactors DELETE triggers to fire at the statement level.

I chose delete triggers partly out of simplicity, and partly because there
some before/after row linkage in the ON UPDATE CASCADE cases where statement
level triggers might not be feasible as we have currently implemented them.

I tried to review this patch, also with the intention to learn more about
AFTER triggers internals. As for the code, I understood your idea and it
really seems like low hanging fruit. However in trigger.c I noticed a comment
that transition table is not supported for deferred constraints. Of course I
tried to use this information to test your patch:

CREATE TABLE t_pk(i int PRIMARY KEY);
CREATE TABLE t_fk(i int REFERENCES t_pk ON DELETE NO ACTION DEFERRABLE);
INSERT INTO t_pk(i) VALUES (1);
INSERT INTO t_fk(i) VALUES (1);
BEGIN;
SET CONSTRAINTS t_fk_i_fkey DEFERRED;
DELETE FROM t_pk;
COMMIT;

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Missing transition table really appears to be the problem:

#0 0x0000000000a8c2a8 in tuplestore_tuple_count (state=0x7f7f7f7f7f7f7f7f) at tuplestore.c:548
#1 0x00000000009b0e58 in ri_on_delete_restrict (trigdata=0x7ffe800c2890, is_no_action=true) at ri_triggers.c:720
#2 0x00000000009b0d3f in RI_FKey_noaction_del (fcinfo=0x7ffe800c27a0) at ri_triggers.c:607
#3 0x00000000006b8768 in ExecCallTriggerFunc (trigdata=0x7ffe800c2890, tgindx=0, finfo=0x1f08c10, instr=0x0, per_tuple_context=0x1f30690) at trigger.c:2412
#4 0x00000000006bbb86 in AfterTriggerExecute (event=0x1f0ab10, rel=0x7fc71306ea70, trigdesc=0x1f089f8, finfo=0x1f08c10, instr=0x0, per_tuple_context=0x1f30690, trig_tuple_slot1=0x0, trig_tuple_slot2=0x0)
at trigger.c:4367
#5 0x00000000006bbf9e in afterTriggerInvokeEvents (events=0xf97950 <afterTriggers+16>, firing_id=1, estate=0x1f086c8, delete_ok=true) at trigger.c:4560
#6 0x00000000006bc844 in AfterTriggerFireDeferred () at trigger.c:4996
#7 0x000000000055c989 in CommitTransaction () at xact.c:1987
#8 0x000000000055d6a4 in CommitTransactionCommand () at xact.c:2832

While the idea to use the transition table is good, this approach probably
requires the trigger engine (trigger.c) to be adjusted, and that in a
non-trivial way.

I'm also not sure if it's o.k. that performance related patch potentially
makes performance worse in some cases. If FK violations are checked at
statement boundary, the wasted effort / time can (at least in theory) be high
if early rows violate the FK.

Have you considered bulk processing of individual rows by row-level trigger?
For IMMEDIATE constraints we'd have to ensure that the trigger is notified
that the current row is the last one from the current query, but that might
not be difficult.

--
Antonin Houska
https://www.cybertec-postgresql.com

#12

Corey Huinker

corey.huinker@gmail.com

almost 7 years ago

In reply to: Antonin Houska (#11)

Re: Referential Integrity Checks with Statement-level Triggers

While the idea to use the transition table is good, this approach probably
requires the trigger engine (trigger.c) to be adjusted, and that in a
non-trivial way.

It probably does. Several people with advanced knowledge of trigger.c
expressed a desire to rebuild trigger.c from the ground up, and with it
create case-specific tuplestores for handling referential integrity
constraints, which would be lighter than either the transition tables or
the per-row invocation of a trigger. After all, we need a RI check to
happen, we don't need it to happen *through a trigger function*.

I'm also not sure if it's o.k. that performance related patch potentially

makes performance worse in some cases. If FK violations are checked at
statement boundary, the wasted effort / time can (at least in theory) be
high
if early rows violate the FK.

That concern was also expressed with varying levels of alarm in their
voices.

Have you considered bulk processing of individual rows by row-level trigger?

For IMMEDIATE constraints we'd have to ensure that the trigger is notified
that the current row is the last one from the current query, but that might
not be difficult.

I'm not sure I understand what you're suggesting, but if it keeps the
overhead of one trigger firing per row deleted, then it doesn't seem like
much of a win.

Given that this patch has been punted to v13, I'd like to instead look at
how we might go about building up the transition tuplestores for the
specific purpose of doing the RI checks, not just deletes, and executing
those at the appropriate time, rather than trying to make our needs fit
into trigger form.

#13

Antonin Houska

ah@cybertec.at

almost 7 years ago

In reply to: Corey Huinker (#12)

Re: Referential Integrity Checks with Statement-level Triggers

Corey Huinker <corey.huinker@gmail.com> wrote:

Have you considered bulk processing of individual rows by row-level trigger?
For IMMEDIATE constraints we'd have to ensure that the trigger is notified
that the current row is the last one from the current query, but that might
not be difficult.

I'm not sure I understand what you're suggesting, but if it keeps the
overhead of one trigger firing per row deleted, then it doesn't seem like
much of a win.

I thought of a trigger that still fires for each row, but most of the time it
only stores the row deleted into a tuplestore of the appropriate lifespan. The
queries that you proposed would only be executed if the tuplestore contains
given amount of tuples or if the last row of the current statement has been
stored.

Given that this patch has been punted to v13, I'd like to instead look at
how we might go about building up the transition tuplestores for the
specific purpose of doing the RI checks, not just deletes, and executing
those at the appropriate time, rather than trying to make our needs fit into
trigger form.

Constraint-specific tuplestore can make some things easier, but if table has
both constraints and (non-constraint) triggers which use the transition
tables, then the tuples will have to be stored in both tuplestores. On the
other hand, using the same tuplestore for both constraint and non-constraint
triggers is difficult because deferred constraint triggers need to see rows
added by all statements while the non-constraint triggers should only see rows
of the current statement.

In order to avoid per-row calls of the constraint trigger functions, we could
try to "aggregate" the constraint-specific events somehow, but I think a
separate queue would be needed for the constraint-specific events.

In general, the (after) triggers and constraints have too much in common, so
separation of these w/o seeing code changes is beyond my imagination.

--
Antonin Houska
https://www.cybertec-postgresql.com

#14

Corey Huinker

corey.huinker@gmail.com

almost 7 years ago

In reply to: Antonin Houska (#13)

Re: Referential Integrity Checks with Statement-level Triggers

In order to avoid per-row calls of the constraint trigger functions, we
could
try to "aggregate" the constraint-specific events somehow, but I think a
separate queue would be needed for the constraint-specific events.

In general, the (after) triggers and constraints have too much in common,
so
separation of these w/o seeing code changes is beyond my imagination.

Yeah, there's a lot of potential for overlap where a trigger could "borrow"
an RI tuplestore or vice versa.

The people who expressed opinions on nuking triggers from orbit (it's the
only way to be sure) have yet to offer up any guidance on how to proceed
from here, and I suspect it's because they're all very busy getting things
ready for v12. I definitely have an interest in working on this for 13, but
I don't feel good about striking out on my own without their input.

#15

Thomas Munro

thomas.munro@gmail.com

over 6 years ago

In reply to: Corey Huinker (#14)

Re: Referential Integrity Checks with Statement-level Triggers

On Tue, Feb 26, 2019 at 5:41 AM Corey Huinker <corey.huinker@gmail.com> wrote:

In order to avoid per-row calls of the constraint trigger functions, we could
try to "aggregate" the constraint-specific events somehow, but I think a
separate queue would be needed for the constraint-specific events.

In general, the (after) triggers and constraints have too much in common, so
separation of these w/o seeing code changes is beyond my imagination.

Yeah, there's a lot of potential for overlap where a trigger could "borrow" an RI tuplestore or vice versa.

The people who expressed opinions on nuking triggers from orbit (it's the only way to be sure) have yet to offer up any guidance on how to proceed from here, and I suspect it's because they're all very busy getting things ready for v12. I definitely have an interest in working on this for 13, but I don't feel good about striking out on my own without their input.

Very interesting thread, but the current patch has been through two
CFs without comments or new patches, so I'm going to mark it "Returned
with feedback". I hope all this discussion will trigger more research
in this space.

--
Thomas Munro
https://enterprisedb.com

#16

Corey Huinker

corey.huinker@gmail.com

over 6 years ago

In reply to: Thomas Munro (#15)

Re: Referential Integrity Checks with Statement-level Triggers

The people who expressed opinions on nuking triggers from orbit (it's

the only way to be sure) have yet to offer up any guidance on how to
proceed from here, and I suspect it's because they're all very busy getting
things ready for v12. I definitely have an interest in working on this for
13, but I don't feel good about striking out on my own without their input.

Very interesting thread, but the current patch has been through two
CFs without comments or new patches, so I'm going to mark it "Returned
with feedback". I hope all this discussion will trigger more research
in this space.

I've noticed that the zedstore efforts ran into the same problem that
refactoring triggers has: we cannot determine which columns in a table will
be affected by a trigger. so we have to assume that all of them will be.
This causes a lot of unnecessary overhead with triggers. If we had a
compilation step for triggers (which, ultimately means a compilation step
for procedures) which kept a dependency tree of which tables/columns were
touched, then we would have that insight. it's true that one dynamic
statement or SELECT * would force us right back to keep-everything, but if
procedures which did not do such things had performance benefits, that
would be an incentive to code them more fastidiously.