BEFORE ROW triggers for partitioned tables

Started by Alvaro Herreraalmost 6 years ago12 messages
#1Alvaro Herrera
alvherre@2ndquadrant.com
1 attachment(s)

Enabling BEFORE triggers FOR EACH ROW in partitioned tables is very easy
-- just remove the check against them. With that, they work fine.

The main problem is that the executor is not prepared to re-route the
tuple if the user decides to change the partitioning columns (so you get
the error that the partitioning constraint is violated by the partition,
which makes no sense if you're inserting in the top-level partitioned
table). There are several views about that:

1. Just let it be. If the user does something silly, it's their problem
if they get an ugly error message.

2. If the partition keys are changed, raise an error. The trigger is
allowed to change everything but those columns. Then there's no
conflict, and it allows desirable use cases.

3. Allow the partition keys to change, as long as the tuple ends up in
the same partition. This is the same as (1) except the error message is
nicer.

The attached patch implements (2). The cases that are allowed by (3)
are a strict superset of those allowed by (2), so if we decide to allow
it in the future, it is possible without breaking anything that works
after implementing (2).

The implementation harnesses the newly added pg_trigger.tgparentid
column; if that is set to a non-zero value, then we search up the
partitioning hierarchy for each partitioning key column, and verify the
values are bitwise equal, up to the "root". Notes:

* We must check all levels, not just the one immediately above, because
the routing might involve crawling down several levels, and any of those
might become invalidated if the trigger changes values.

* The "root" is not necessarily the root partitioned table, but instead
it's the table that was named in the command. Because of this, we don't
need to acquire locks on the tables, since the executor already has the
tables open and locked (thus they cannot be modified by concurrent
commands).

* I find it a little odd that the leaf partition doesn't have any intel
on what its partitioning columns are. I guess they haven't been needed
thus far, and it seems inappropriate for this admittedly very small
feature to add such a burden on the rest of the system.

* The new function I added, ReportTriggerPartkeyChange(), contains one
serious bug (namely: it doesn't map attribute numbers properly if
partitions are differently defined). Also, it has a performance issue,
namely that we do heap_getattr() potentially repeatedly -- maybe it'd be
better to "slotify" the tuple prior to doing the checks. Another
possible controversial point is that its location in commands/trigger.c
isn't great. (Frankly, I don't understand why the executor trigger
firing functions are in commands/ at all.)

Thoughts?

--
�lvaro Herrera

Attachments:

0001-Enable-BEFORE-row-level-triggers-for-partitioned-tab.patchtext/x-diff; charset=us-asciiDownload
From 20ad3320d9d5f16756d3fd0bba2db5df74117d35 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 26 Feb 2020 17:34:54 -0300
Subject: [PATCH] Enable BEFORE row-level triggers for partitioned tables

... with the limitation that if partitioning columns are changed, the
operation is aborted.  (The reason for this limitation is that it might
require re-routing the tuple, which doesn't work.)
---
 src/backend/commands/tablecmds.c       |   7 --
 src/backend/commands/trigger.c         | 106 ++++++++++++++++++++++---
 src/backend/partitioning/partdesc.c    |   2 +-
 src/include/utils/reltrigger.h         |   1 +
 src/test/regress/expected/triggers.out |  53 ++++++++++++-
 src/test/regress/sql/triggers.sql      |  39 ++++++++-
 6 files changed, 182 insertions(+), 26 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 02a7c04fdb..3b560067a4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16503,13 +16503,6 @@ CloneRowTriggersToPartition(Relation parent, Relation partition)
 			 !OidIsValid(trigForm->tgparentid)))
 			continue;
 
-		/*
-		 * Complain if we find an unexpected trigger type.
-		 */
-		if (!TRIGGER_FOR_AFTER(trigForm->tgtype))
-			elog(ERROR, "unexpected trigger \"%s\" found",
-				 NameStr(trigForm->tgname));
-
 		/* Use short-lived context for CREATE TRIGGER */
 		oldcxt = MemoryContextSwitchTo(perTupCxt);
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 6e8b7223fe..d4687de211 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -53,10 +53,12 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
+#include "utils/datum.h"
 #include "utils/fmgroids.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/partcache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
@@ -82,6 +84,9 @@ static int	MyTriggerDepth = 0;
 /* Local function prototypes */
 static void ConvertTriggerToFK(CreateTrigStmt *stmt, Oid funcoid);
 static void SetTriggerFlags(TriggerDesc *trigdesc, Trigger *trigger);
+static void ReportTriggerPartkeyChange(ResultRelInfo *relinfo,
+					HeapTuple oldtuple, HeapTuple newtuple,
+					const char *trigname);
 static bool GetTupleForTrigger(EState *estate,
 							   EPQState *epqstate,
 							   ResultRelInfo *relinfo,
@@ -226,18 +231,6 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 		 */
 		if (stmt->row)
 		{
-			/*
-			 * BEFORE triggers FOR EACH ROW are forbidden, because they would
-			 * allow the user to direct the row to another partition, which
-			 * isn't implemented in the executor.
-			 */
-			if (stmt->timing != TRIGGER_TYPE_AFTER)
-				ereport(ERROR,
-						(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-						 errmsg("\"%s\" is a partitioned table",
-								RelationGetRelationName(rel)),
-						 errdetail("Partitioned tables cannot have BEFORE / FOR EACH ROW triggers.")));
-
 			/*
 			 * Disallow use of transition tables.
 			 *
@@ -1977,6 +1970,7 @@ RelationBuildTriggers(Relation relation)
 		build->tgtype = pg_trigger->tgtype;
 		build->tgenabled = pg_trigger->tgenabled;
 		build->tgisinternal = pg_trigger->tgisinternal;
+		build->tgisclone = OidIsValid(pg_trigger->tgparentid);
 		build->tgconstrrelid = pg_trigger->tgconstrrelid;
 		build->tgconstrindid = pg_trigger->tgconstrindid;
 		build->tgconstraint = pg_trigger->tgconstraint;
@@ -2280,6 +2274,8 @@ equalTriggerDescs(TriggerDesc *trigdesc1, TriggerDesc *trigdesc2)
 				return false;
 			if (trig1->tgisinternal != trig2->tgisinternal)
 				return false;
+			if (trig1->tgisclone != trig2->tgisclone)
+				return false;
 			if (trig1->tgconstrrelid != trig2->tgconstrrelid)
 				return false;
 			if (trig1->tgconstrindid != trig2->tgconstrindid)
@@ -2576,6 +2572,14 @@ ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 		}
 		else if (newtuple != oldtuple)
 		{
+			/*
+			 * If this trigger is a clone of some trigger in a partitioned
+			 * table, disallow it from changing the partitioning key.
+			 */
+			if (trigger->tgisclone)
+				ReportTriggerPartkeyChange(relinfo, oldtuple, newtuple,
+										   trigger->tgname);
+
 			ExecForceStoreHeapTuple(newtuple, slot, false);
 
 			if (should_free)
@@ -3102,6 +3106,10 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 		{
 			ExecForceStoreHeapTuple(newtuple, newslot, false);
 
+			if (trigger->tgisclone)
+				ReportTriggerPartkeyChange(relinfo, trigtuple, newtuple,
+										   trigger->tgname);
+
 			/*
 			 * If the tuple returned by the trigger / being stored, is the old
 			 * row version, and the heap tuple passed to the trigger was
@@ -3411,6 +3419,80 @@ GetTupleForTrigger(EState *estate,
 	return true;
 }
 
+/*
+ * When a tuple in a partition is being modified by a BEFORE trigger, raise an
+ * error if the partitioning key columns are changed by the trigger.
+ */
+static void
+ReportTriggerPartkeyChange(ResultRelInfo *relinfo,
+					HeapTuple oldtuple, HeapTuple newtuple,
+					const char *trigname)
+{
+	Oid			current_rel_id;
+
+	/*
+	 * For each partitioning level, climbing up from the leaf, check whether
+	 * the partitioning key columns are being bytewise modified.  If any
+	 * column is modified, abort the operation.
+	 *
+	 * Repeat the check until we reach the root of the partitioning hierarchy.
+	 */
+	current_rel_id = relinfo->ri_RelationDesc->rd_id;
+	for (;;)
+	{
+		PartitionKey	key;
+		Oid				parent;
+		Relation		prel;
+
+		parent = get_partition_parent(current_rel_id);
+		prel = relation_open(parent, NoLock);	/* we already hold lock */
+		key = RelationGetPartitionKey(prel);
+		for (int i = 0; i < key->partnatts; i++)
+		{
+			Form_pg_attribute att;
+			AttrNumber	attnum = key->partattrs[i];
+			bool	isnull;
+			Datum	oatt,
+					natt;
+
+			/*
+			 * FIXME [at least] two problems here: 1. we repeat attr
+			 * extraction for every partitioning level; 2. we need to map
+			 * attribute numbers.  Maybe it'd be better to be passed slots, not
+			 * heap-tuples.
+			 */
+			oatt = heap_getattr(oldtuple, attnum, prel->rd_att, &isnull);
+			Assert(!isnull);	/* partkeys cannot be null */
+			natt = heap_getattr(newtuple, attnum, prel->rd_att, &isnull);
+			Assert(!isnull);
+
+			att = TupleDescAttr(prel->rd_att, attnum - 1);
+
+			/*
+			 * No need for datum_image_eq detoasting here; if the partition key
+			 * changed at all, report that as an error.
+			 */
+			if (!datumIsEqual(oatt, natt,
+							  att->attbyval, att->attlen))
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("changing partitioning columns in a before trigger is not supported"),
+						 errdetail("Column \"%s\" was changed by trigger \"%s\".",
+								   NameStr(att->attname), trigname)));
+		}
+
+		/* up one level */
+		current_rel_id = parent;
+		relation_close(prel, NoLock);
+
+		/* If we just checked the root, we're done */
+		if (current_rel_id == RelationGetRelid(relinfo->ri_PartitionRoot))
+			break;
+
+		CHECK_FOR_INTERRUPTS();		/* for luck */
+	}
+}
+
 /*
  * Is trigger enabled to fire?
  */
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index bdda42e40f..e7f23542e8 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -307,7 +307,7 @@ CreatePartitionDirectory(MemoryContext mcxt)
  *
  * The purpose of this function is to ensure that we get the same
  * PartitionDesc for each relation every time we look it up.  In the
- * face of current DDL, different PartitionDescs may be constructed with
+ * face of concurrent DDL, different PartitionDescs may be constructed with
  * different views of the catalog state, but any single particular OID
  * will always get the same PartitionDesc for as long as the same
  * PartitionDirectory is used.
diff --git a/src/include/utils/reltrigger.h b/src/include/utils/reltrigger.h
index 28df43d833..b22acb034e 100644
--- a/src/include/utils/reltrigger.h
+++ b/src/include/utils/reltrigger.h
@@ -29,6 +29,7 @@ typedef struct Trigger
 	int16		tgtype;
 	char		tgenabled;
 	bool		tgisinternal;
+	bool		tgisclone;
 	Oid			tgconstrrelid;
 	Oid			tgconstrindid;
 	Oid			tgconstraint;
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 22e65cc1ec..dfd7cef743 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -1958,10 +1958,6 @@ drop table my_table;
 create table parted_trig (a int) partition by list (a);
 create function trigger_nothing() returns trigger
   language plpgsql as $$ begin end; $$;
-create trigger failed before insert or update or delete on parted_trig
-  for each row execute procedure trigger_nothing();
-ERROR:  "parted_trig" is a partitioned table
-DETAIL:  Partitioned tables cannot have BEFORE / FOR EACH ROW triggers.
 create trigger failed instead of update on parted_trig
   for each row execute procedure trigger_nothing();
 ERROR:  "parted_trig" is a table
@@ -2246,6 +2242,55 @@ NOTICE:  aasvogel <- woof!
 NOTICE:  trigger parted_trig on parted1_irreg AFTER INSERT for ROW: (a,b)=(3,aasvogel)
 NOTICE:  trigger parted_trig_odd on parted1_irreg AFTER INSERT for ROW: (a,b)=(3,aasvogel)
 drop table parted_irreg_ancestor;
+-- Before triggers and partitions
+create table parted (a int, b int, c text) partition by list (a);
+create table parted_1 partition of parted for values in (1)
+  partition by list (b);
+create table parted_1_1 partition of parted_1 for values in (1);
+create function parted_trigfunc() returns trigger language plpgsql as $$
+begin
+  new.a = new.a + 1;
+  return new;
+end;
+$$;
+insert into parted values (1, 1, 'uno uno');    -- works
+create trigger t before insert or update or delete on parted
+  for each row execute procedure parted_trigfunc();
+insert into parted values (1, 1, 'uno uno v2');    -- fail
+ERROR:  changing partitioning columns in a before trigger is not supported
+DETAIL:  Column "a" was changed by trigger "t".
+update parted set c = c || 'v3';                   -- fail
+ERROR:  changing partitioning columns in a before trigger is not supported
+DETAIL:  Column "a" was changed by trigger "t".
+create or replace function parted_trigfunc() returns trigger language plpgsql as $$
+begin
+  new.b = new.b + 1;
+  return new;
+end;
+$$;
+insert into parted values (1, 1, 'uno uno v4');    -- fail
+ERROR:  changing partitioning columns in a before trigger is not supported
+DETAIL:  Column "b" was changed by trigger "t".
+update parted set c = c || 'v5';                   -- fail
+ERROR:  changing partitioning columns in a before trigger is not supported
+DETAIL:  Column "b" was changed by trigger "t".
+create or replace function parted_trigfunc() returns trigger language plpgsql as $$
+begin
+  new.c = new.c || ' and so';
+  return new;
+end;
+$$;
+insert into parted values (1, 1, 'uno uno');       -- works
+update parted set c = c || ' v6';                   -- works
+select tableoid::regclass, * from parted;
+  tableoid  | a | b |            c             
+------------+---+---+--------------------------
+ parted_1_1 | 1 | 1 | uno uno v6 and so
+ parted_1_1 | 1 | 1 | uno uno and so v6 and so
+(2 rows)
+
+drop table parted;
+drop function parted_trigfunc();
 --
 -- Constraint triggers and partitioned tables
 create table parted_constr_ancestor (a int, b text)
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 0f61fdf0ea..3d9f14618c 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1348,8 +1348,6 @@ drop table my_table;
 create table parted_trig (a int) partition by list (a);
 create function trigger_nothing() returns trigger
   language plpgsql as $$ begin end; $$;
-create trigger failed before insert or update or delete on parted_trig
-  for each row execute procedure trigger_nothing();
 create trigger failed instead of update on parted_trig
   for each row execute procedure trigger_nothing();
 create trigger failed after update on parted_trig
@@ -1561,6 +1559,43 @@ insert into parted1_irreg values ('aardwolf', 2);
 insert into parted_irreg_ancestor values ('aasvogel', 3);
 drop table parted_irreg_ancestor;
 
+-- Before triggers and partitions
+create table parted (a int, b int, c text) partition by list (a);
+create table parted_1 partition of parted for values in (1)
+  partition by list (b);
+create table parted_1_1 partition of parted_1 for values in (1);
+create function parted_trigfunc() returns trigger language plpgsql as $$
+begin
+  new.a = new.a + 1;
+  return new;
+end;
+$$;
+insert into parted values (1, 1, 'uno uno');    -- works
+create trigger t before insert or update or delete on parted
+  for each row execute procedure parted_trigfunc();
+insert into parted values (1, 1, 'uno uno v2');    -- fail
+update parted set c = c || 'v3';                   -- fail
+create or replace function parted_trigfunc() returns trigger language plpgsql as $$
+begin
+  new.b = new.b + 1;
+  return new;
+end;
+$$;
+insert into parted values (1, 1, 'uno uno v4');    -- fail
+update parted set c = c || 'v5';                   -- fail
+create or replace function parted_trigfunc() returns trigger language plpgsql as $$
+begin
+  new.c = new.c || ' and so';
+  return new;
+end;
+$$;
+insert into parted values (1, 1, 'uno uno');       -- works
+update parted set c = c || ' v6';                   -- works
+select tableoid::regclass, * from parted;
+
+drop table parted;
+drop function parted_trigfunc();
+
 --
 -- Constraint triggers and partitioned tables
 create table parted_constr_ancestor (a int, b text)
-- 
2.20.1

#2Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Alvaro Herrera (#1)
Re: BEFORE ROW triggers for partitioned tables

On 2020-02-27 17:51, Alvaro Herrera wrote:

Enabling BEFORE triggers FOR EACH ROW in partitioned tables is very easy
-- just remove the check against them. With that, they work fine.

This looks good to me in principle. It's a useful thing to support.

1. Just let it be. If the user does something silly, it's their problem
if they get an ugly error message.

2. If the partition keys are changed, raise an error. The trigger is
allowed to change everything but those columns. Then there's no
conflict, and it allows desirable use cases.

3. Allow the partition keys to change, as long as the tuple ends up in
the same partition. This is the same as (1) except the error message is
nicer.

The attached patch implements (2).

That seems alright to me.

* The new function I added, ReportTriggerPartkeyChange(), contains one
serious bug (namely: it doesn't map attribute numbers properly if
partitions are differently defined). Also, it has a performance issue,
namely that we do heap_getattr() potentially repeatedly -- maybe it'd be
better to "slotify" the tuple prior to doing the checks.

The attribute ordering issue obviously needs to be addressed, but the
performance issue is probably not so important. How many levels of
partitioning are we expecting?

Another
possible controversial point is that its location in commands/trigger.c
isn't great. (Frankly, I don't understand why the executor trigger
firing functions are in commands/ at all.)

yeah ...

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Alvaro Herrera (#1)
Re: BEFORE ROW triggers for partitioned tables

On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

* The "root" is not necessarily the root partitioned table, but instead
it's the table that was named in the command. Because of this, we don't
need to acquire locks on the tables, since the executor already has the
tables open and locked (thus they cannot be modified by concurrent
commands).

I believe this is because of the partition level constraints on the
table that was named in the command would catch any violation in the
partition key change in the levels above that table.

Will it be easier to subject the new tuple to the partition level
constraints themselves and report if those are violated. See
RelationGetPartitionQual() for getting partition constraints. This
function includes partition constraints from all the levels so in your
function you don't have to walk up the partition tree. It includes
constraints from the level above the table that was named in the
command, but that might be fine. We will catch the error earlier and
may be provide a better error message.

* The new function I added, ReportTriggerPartkeyChange(), contains one
serious bug (namely: it doesn't map attribute numbers properly if
partitions are differently defined).

IIUC the code in your patch, it seems you are just looking at
partnatts. But partition key can contain expressions also which are
stored in partexprs. So, I think the code won't catch change in the
partition key values when it contains expression. Using
RelationGetPartitionQual() will fix this problem and also problem of
attribute remapping across the partition hierarchy.

--
Best Wishes,
Ashutosh Bapat

#4Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Ashutosh Bapat (#3)
Re: BEFORE ROW triggers for partitioned tables

On Wed, Mar 11, 2020 at 8:53 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

* The "root" is not necessarily the root partitioned table, but instead
it's the table that was named in the command. Because of this, we don't
need to acquire locks on the tables, since the executor already has the
tables open and locked (thus they cannot be modified by concurrent
commands).

I believe this is because of the partition level constraints on the
table that was named in the command would catch any violation in the
partition key change in the levels above that table.

Will it be easier to subject the new tuple to the partition level
constraints themselves and report if those are violated. See
RelationGetPartitionQual() for getting partition constraints. This
function includes partition constraints from all the levels so in your
function you don't have to walk up the partition tree. It includes
constraints from the level above the table that was named in the
command, but that might be fine. We will catch the error earlier and
may be provide a better error message.

I realized that this will implement the third option in your original
proposal, not the second one. I suppose that's fine too?
--
Best Wishes,
Ashutosh Bapat

#5Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Ashutosh Bapat (#4)
Re: BEFORE ROW triggers for partitioned tables

On 2020-03-12 05:17, Ashutosh Bapat wrote:

On Wed, Mar 11, 2020 at 8:53 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

Will it be easier to subject the new tuple to the partition level
constraints themselves and report if those are violated. See
RelationGetPartitionQual() for getting partition constraints. This
function includes partition constraints from all the levels so in your
function you don't have to walk up the partition tree. It includes
constraints from the level above the table that was named in the
command, but that might be fine. We will catch the error earlier and
may be provide a better error message.

I realized that this will implement the third option in your original
proposal, not the second one. I suppose that's fine too?

It might be that that is actually easier to do. Instead of trying to
figure out which columns have changed, in the face of different column
ordering and general expressions, just check after a trigger whether the
column still fits into the partition.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#3)
1 attachment(s)
Re: BEFORE ROW triggers for partitioned tables

On 2020-Mar-11, Ashutosh Bapat wrote:

On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

* The new function I added, ReportTriggerPartkeyChange(), contains one
serious bug (namely: it doesn't map attribute numbers properly if
partitions are differently defined).

IIUC the code in your patch, it seems you are just looking at
partnatts. But partition key can contain expressions also which are
stored in partexprs. So, I think the code won't catch change in the
partition key values when it contains expression. Using
RelationGetPartitionQual() will fix this problem and also problem of
attribute remapping across the partition hierarchy.

Oh, of course.

In fact, I don't need to deal with PartitionQual directly; I can just
use ExecPartitionCheck(), since in ExecBRInsertTriggers et al we already
have all we need. v2 attached.

Here's some example output. With my previous patch, this was the error
we reported:

insert into parted values (1, 1, 'uno uno v2'); -- fail
ERROR: changing partitioning columns in a before trigger is not supported
DETAIL: Column "a" was changed by trigger "t".

Now, passing emitError=true to ExecPartitionCheck, I get this:

insert into parted values (1, 1, 'uno uno v2'); -- fail
ERROR: new row for relation "parted_1_1" violates partition constraint
DETAIL: Failing row contains (2, 1, uno uno v2).

Note the discrepancy in the table named in the INSERT vs. the one in the
error message. This is a low-quality error IMO. So I'd instead pass
emitError=false, and produce my own error. It's useful to report the
trigger name and original partition name:

insert into parted values (1, 1, 'uno uno v2'); -- fail
ERROR: moving row to another partition during a BEFORE trigger is not supported
DETAIL: Before trigger "t", row was to be in partition "public.parted_1_1"

Note that in this implementation I no longer know which column is the
problematic one, but I suppose users have clue enough. Wording
suggestions welcome.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v2-0001-Enable-BEFORE-row-level-triggers-for-partitioned-.patchtext/x-diff; charset=us-asciiDownload
From f13d7ad4c081ede118b0f6c262ad92a49ea9f64c Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 26 Feb 2020 17:34:54 -0300
Subject: [PATCH v2] Enable BEFORE row-level triggers for partitioned tables

... with the limitation that if partitioning columns are changed, the
operation is aborted.  (The reason for this limitation is that it might
require re-routing the tuple, which doesn't work.)
---
 src/backend/commands/tablecmds.c       |  7 ----
 src/backend/commands/trigger.c         | 40 +++++++++++++------
 src/backend/partitioning/partdesc.c    |  2 +-
 src/include/utils/reltrigger.h         |  1 +
 src/test/regress/expected/triggers.out | 53 ++++++++++++++++++++++++--
 src/test/regress/sql/triggers.sql      | 39 ++++++++++++++++++-
 6 files changed, 116 insertions(+), 26 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3eb861bfbf..08ad595596 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16501,13 +16501,6 @@ CloneRowTriggersToPartition(Relation parent, Relation partition)
 			 !OidIsValid(trigForm->tgparentid)))
 			continue;
 
-		/*
-		 * Complain if we find an unexpected trigger type.
-		 */
-		if (!TRIGGER_FOR_AFTER(trigForm->tgtype))
-			elog(ERROR, "unexpected trigger \"%s\" found",
-				 NameStr(trigForm->tgname));
-
 		/* Use short-lived context for CREATE TRIGGER */
 		oldcxt = MemoryContextSwitchTo(perTupCxt);
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 513427edf1..98911facad 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -221,18 +221,6 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 		 */
 		if (stmt->row)
 		{
-			/*
-			 * BEFORE triggers FOR EACH ROW are forbidden, because they would
-			 * allow the user to direct the row to another partition, which
-			 * isn't implemented in the executor.
-			 */
-			if (stmt->timing != TRIGGER_TYPE_AFTER)
-				ereport(ERROR,
-						(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-						 errmsg("\"%s\" is a partitioned table",
-								RelationGetRelationName(rel)),
-						 errdetail("Partitioned tables cannot have BEFORE / FOR EACH ROW triggers.")));
-
 			/*
 			 * Disallow use of transition tables.
 			 *
@@ -1658,6 +1646,7 @@ RelationBuildTriggers(Relation relation)
 		build->tgtype = pg_trigger->tgtype;
 		build->tgenabled = pg_trigger->tgenabled;
 		build->tgisinternal = pg_trigger->tgisinternal;
+		build->tgisclone = OidIsValid(pg_trigger->tgparentid);
 		build->tgconstrrelid = pg_trigger->tgconstrrelid;
 		build->tgconstrindid = pg_trigger->tgconstrindid;
 		build->tgconstraint = pg_trigger->tgconstraint;
@@ -1961,6 +1950,8 @@ equalTriggerDescs(TriggerDesc *trigdesc1, TriggerDesc *trigdesc2)
 				return false;
 			if (trig1->tgisinternal != trig2->tgisinternal)
 				return false;
+			if (trig1->tgisclone != trig2->tgisclone)
+				return false;
 			if (trig1->tgconstrrelid != trig2->tgconstrrelid)
 				return false;
 			if (trig1->tgconstrindid != trig2->tgconstrindid)
@@ -2247,6 +2238,21 @@ ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 		{
 			ExecForceStoreHeapTuple(newtuple, slot, false);
 
+			/*
+			 * After a tuple in a partition goes through a trigger, the user
+			 * could have changed the partition key enough that the tuple
+			 * no longer fits the partition.  Verify that.
+			 */
+			if (trigger->tgisclone &&
+				!ExecPartitionCheck(relinfo, slot, estate, false))
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("moving row to another partition during a BEFORE trigger is not supported"),
+						 errdetail("Before trigger \"%s\", row was to be in partition \"%s.%s\"",
+								   trigger->tgname,
+								   get_namespace_name(RelationGetNamespace(relinfo->ri_RelationDesc)),
+								   RelationGetRelationName(relinfo->ri_RelationDesc))));
+
 			if (should_free)
 				heap_freetuple(oldtuple);
 
@@ -2741,6 +2747,16 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 		{
 			ExecForceStoreHeapTuple(newtuple, newslot, false);
 
+			if (trigger->tgisclone &&
+				!ExecPartitionCheck(relinfo, newslot, estate, false))
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("moving row to another partition during a BEFORE trigger is not supported"),
+						 errdetail("Before trigger \"%s\", row was to be in partition \"%s.%s\"",
+								   trigger->tgname,
+								   get_namespace_name(RelationGetNamespace(relinfo->ri_RelationDesc)),
+								   RelationGetRelationName(relinfo->ri_RelationDesc))));
+
 			/*
 			 * If the tuple returned by the trigger / being stored, is the old
 			 * row version, and the heap tuple passed to the trigger was
diff --git a/src/backend/partitioning/partdesc.c b/src/backend/partitioning/partdesc.c
index bdda42e40f..e7f23542e8 100644
--- a/src/backend/partitioning/partdesc.c
+++ b/src/backend/partitioning/partdesc.c
@@ -307,7 +307,7 @@ CreatePartitionDirectory(MemoryContext mcxt)
  *
  * The purpose of this function is to ensure that we get the same
  * PartitionDesc for each relation every time we look it up.  In the
- * face of current DDL, different PartitionDescs may be constructed with
+ * face of concurrent DDL, different PartitionDescs may be constructed with
  * different views of the catalog state, but any single particular OID
  * will always get the same PartitionDesc for as long as the same
  * PartitionDirectory is used.
diff --git a/src/include/utils/reltrigger.h b/src/include/utils/reltrigger.h
index 28df43d833..b22acb034e 100644
--- a/src/include/utils/reltrigger.h
+++ b/src/include/utils/reltrigger.h
@@ -29,6 +29,7 @@ typedef struct Trigger
 	int16		tgtype;
 	char		tgenabled;
 	bool		tgisinternal;
+	bool		tgisclone;
 	Oid			tgconstrrelid;
 	Oid			tgconstrindid;
 	Oid			tgconstraint;
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 22e65cc1ec..c51e3acb14 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -1958,10 +1958,6 @@ drop table my_table;
 create table parted_trig (a int) partition by list (a);
 create function trigger_nothing() returns trigger
   language plpgsql as $$ begin end; $$;
-create trigger failed before insert or update or delete on parted_trig
-  for each row execute procedure trigger_nothing();
-ERROR:  "parted_trig" is a partitioned table
-DETAIL:  Partitioned tables cannot have BEFORE / FOR EACH ROW triggers.
 create trigger failed instead of update on parted_trig
   for each row execute procedure trigger_nothing();
 ERROR:  "parted_trig" is a table
@@ -2246,6 +2242,55 @@ NOTICE:  aasvogel <- woof!
 NOTICE:  trigger parted_trig on parted1_irreg AFTER INSERT for ROW: (a,b)=(3,aasvogel)
 NOTICE:  trigger parted_trig_odd on parted1_irreg AFTER INSERT for ROW: (a,b)=(3,aasvogel)
 drop table parted_irreg_ancestor;
+-- Before triggers and partitions
+create table parted (a int, b int, c text) partition by list (a);
+create table parted_1 partition of parted for values in (1)
+  partition by list (b);
+create table parted_1_1 partition of parted_1 for values in (1);
+create function parted_trigfunc() returns trigger language plpgsql as $$
+begin
+  new.a = new.a + 1;
+  return new;
+end;
+$$;
+insert into parted values (1, 1, 'uno uno');    -- works
+create trigger t before insert or update or delete on parted
+  for each row execute procedure parted_trigfunc();
+insert into parted values (1, 1, 'uno uno v2');    -- fail
+ERROR:  moving row to another partition during a BEFORE trigger is not supported
+DETAIL:  Before trigger "t", row was to be in partition "public.parted_1_1"
+update parted set c = c || 'v3';                   -- fail
+ERROR:  moving row to another partition during a BEFORE trigger is not supported
+DETAIL:  Before trigger "t", row was to be in partition "public.parted_1_1"
+create or replace function parted_trigfunc() returns trigger language plpgsql as $$
+begin
+  new.b = new.b + 1;
+  return new;
+end;
+$$;
+insert into parted values (1, 1, 'uno uno v4');    -- fail
+ERROR:  moving row to another partition during a BEFORE trigger is not supported
+DETAIL:  Before trigger "t", row was to be in partition "public.parted_1_1"
+update parted set c = c || 'v5';                   -- fail
+ERROR:  moving row to another partition during a BEFORE trigger is not supported
+DETAIL:  Before trigger "t", row was to be in partition "public.parted_1_1"
+create or replace function parted_trigfunc() returns trigger language plpgsql as $$
+begin
+  new.c = new.c || ' and so';
+  return new;
+end;
+$$;
+insert into parted values (1, 1, 'uno uno');       -- works
+update parted set c = c || ' v6';                   -- works
+select tableoid::regclass, * from parted;
+  tableoid  | a | b |            c             
+------------+---+---+--------------------------
+ parted_1_1 | 1 | 1 | uno uno v6 and so
+ parted_1_1 | 1 | 1 | uno uno and so v6 and so
+(2 rows)
+
+drop table parted;
+drop function parted_trigfunc();
 --
 -- Constraint triggers and partitioned tables
 create table parted_constr_ancestor (a int, b text)
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 0f61fdf0ea..3d9f14618c 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1348,8 +1348,6 @@ drop table my_table;
 create table parted_trig (a int) partition by list (a);
 create function trigger_nothing() returns trigger
   language plpgsql as $$ begin end; $$;
-create trigger failed before insert or update or delete on parted_trig
-  for each row execute procedure trigger_nothing();
 create trigger failed instead of update on parted_trig
   for each row execute procedure trigger_nothing();
 create trigger failed after update on parted_trig
@@ -1561,6 +1559,43 @@ insert into parted1_irreg values ('aardwolf', 2);
 insert into parted_irreg_ancestor values ('aasvogel', 3);
 drop table parted_irreg_ancestor;
 
+-- Before triggers and partitions
+create table parted (a int, b int, c text) partition by list (a);
+create table parted_1 partition of parted for values in (1)
+  partition by list (b);
+create table parted_1_1 partition of parted_1 for values in (1);
+create function parted_trigfunc() returns trigger language plpgsql as $$
+begin
+  new.a = new.a + 1;
+  return new;
+end;
+$$;
+insert into parted values (1, 1, 'uno uno');    -- works
+create trigger t before insert or update or delete on parted
+  for each row execute procedure parted_trigfunc();
+insert into parted values (1, 1, 'uno uno v2');    -- fail
+update parted set c = c || 'v3';                   -- fail
+create or replace function parted_trigfunc() returns trigger language plpgsql as $$
+begin
+  new.b = new.b + 1;
+  return new;
+end;
+$$;
+insert into parted values (1, 1, 'uno uno v4');    -- fail
+update parted set c = c || 'v5';                   -- fail
+create or replace function parted_trigfunc() returns trigger language plpgsql as $$
+begin
+  new.c = new.c || ' and so';
+  return new;
+end;
+$$;
+insert into parted values (1, 1, 'uno uno');       -- works
+update parted set c = c || ' v6';                   -- works
+select tableoid::regclass, * from parted;
+
+drop table parted;
+drop function parted_trigfunc();
+
 --
 -- Constraint triggers and partitioned tables
 create table parted_constr_ancestor (a int, b text)
-- 
2.20.1

#7Ashutosh Bapat
ashutosh.bapat@2ndquadrant.com
In reply to: Alvaro Herrera (#6)
Re: BEFORE ROW triggers for partitioned tables

On Fri, 13 Mar 2020 at 21:55, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

On 2020-Mar-11, Ashutosh Bapat wrote:

On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

* The new function I added, ReportTriggerPartkeyChange(), contains one
serious bug (namely: it doesn't map attribute numbers properly if
partitions are differently defined).

IIUC the code in your patch, it seems you are just looking at
partnatts. But partition key can contain expressions also which are
stored in partexprs. So, I think the code won't catch change in the
partition key values when it contains expression. Using
RelationGetPartitionQual() will fix this problem and also problem of
attribute remapping across the partition hierarchy.

Oh, of course.

In fact, I don't need to deal with PartitionQual directly; I can just
use ExecPartitionCheck(), since in ExecBRInsertTriggers et al we already
have all we need. v2 attached.

Thanks.

insert into parted values (1, 1, 'uno uno v2'); -- fail
ERROR: moving row to another partition during a BEFORE trigger is not
supported
DETAIL: Before trigger "t", row was to be in partition
"public.parted_1_1"

Note that in this implementation I no longer know which column is the
problematic one, but I suppose users have clue enough. Wording
suggestions welcome.

When we have expression as a partition key, there may not be one particular
column which causes the row to move out of partition. So, this should be
fine.
A slight wording suggestion below.

- * Complain if we find an unexpected trigger type.
- */
- if (!TRIGGER_FOR_AFTER(trigForm->tgtype))
- elog(ERROR, "unexpected trigger \"%s\" found",
- NameStr(trigForm->tgname));

!AFTER means INSTEAD OF and BEFORE. Do you intend to allow INSTEAD OF
triggers
as well?
- */
- if (stmt->timing != TRIGGER_TYPE_AFTER)

Same comment as the above?

+ /*
+ * After a tuple in a partition goes through a trigger, the user
+ * could have changed the partition key enough that the tuple
+ * no longer fits the partition.  Verify that.
+ */
+ if (trigger->tgisclone &&

Why do we want to restrict this check only for triggers which are cloned
from
the ancestors?

+ !ExecPartitionCheck(relinfo, slot, estate, false))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("moving row to another partition during a BEFORE trigger is not
supported"),
+ errdetail("Before trigger \"%s\", row was to be in partition \"%s.%s\"",

In the error message you removed above, we are mentioning BEFORE FOR EACH
ROW
trigger. Should we continue to use the same terminology?

I was wondering whether it would be good to check the partition constraint
only
once i.e. after all before row triggers have been executed. This would avoid
throwing an error in case multiple triggers together worked to keep the
tuple
in the same partition when individual trigger/s caused it to move out of
that
partition. But then we would loose the opportunity to point out the before
row
trigger which actually caused the row to move out of the partition. Anyway,
wanted to bring that for the discussion here.

@@ -307,7 +307,7 @@ CreatePartitionDirectory(MemoryContext mcxt)
  *
  * The purpose of this function is to ensure that we get the same
  * PartitionDesc for each relation every time we look it up.  In the
- * face of current DDL, different PartitionDescs may be constructed with
+ * face of concurrent DDL, different PartitionDescs may be constructed with

Thanks for catching it. Looks unrelated though.

+-- Before triggers and partitions

The test looks good. Should we add a test for partitioned table with
partition
key as expression?

The approach looks good to me.

--
Best Wishes,
Ashutosh

#8Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Ashutosh Bapat (#7)
Re: BEFORE ROW triggers for partitioned tables

I was expecting that documentation somewhere covered the fact that BR
triggers are not supported on a partitioned table. And this patch
would remove/improve that portion of the code. But I don't see any
documentation changes in this patch.

On Tue, Mar 17, 2020 at 10:11 PM Ashutosh Bapat
<ashutosh.bapat@2ndquadrant.com> wrote:

On Fri, 13 Mar 2020 at 21:55, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Mar-11, Ashutosh Bapat wrote:

On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

* The new function I added, ReportTriggerPartkeyChange(), contains one
serious bug (namely: it doesn't map attribute numbers properly if
partitions are differently defined).

IIUC the code in your patch, it seems you are just looking at
partnatts. But partition key can contain expressions also which are
stored in partexprs. So, I think the code won't catch change in the
partition key values when it contains expression. Using
RelationGetPartitionQual() will fix this problem and also problem of
attribute remapping across the partition hierarchy.

Oh, of course.

In fact, I don't need to deal with PartitionQual directly; I can just
use ExecPartitionCheck(), since in ExecBRInsertTriggers et al we already
have all we need. v2 attached.

Thanks.

insert into parted values (1, 1, 'uno uno v2'); -- fail
ERROR: moving row to another partition during a BEFORE trigger is not supported
DETAIL: Before trigger "t", row was to be in partition "public.parted_1_1"

Note that in this implementation I no longer know which column is the
problematic one, but I suppose users have clue enough. Wording
suggestions welcome.

When we have expression as a partition key, there may not be one particular column which causes the row to move out of partition. So, this should be fine.
A slight wording suggestion below.

- * Complain if we find an unexpected trigger type.
- */
- if (!TRIGGER_FOR_AFTER(trigForm->tgtype))
- elog(ERROR, "unexpected trigger \"%s\" found",
- NameStr(trigForm->tgname));

!AFTER means INSTEAD OF and BEFORE. Do you intend to allow INSTEAD OF triggers
as well?
- */
- if (stmt->timing != TRIGGER_TYPE_AFTER)

Same comment as the above?

+ /*
+ * After a tuple in a partition goes through a trigger, the user
+ * could have changed the partition key enough that the tuple
+ * no longer fits the partition.  Verify that.
+ */
+ if (trigger->tgisclone &&

Why do we want to restrict this check only for triggers which are cloned from
the ancestors?

+ !ExecPartitionCheck(relinfo, slot, estate, false))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("moving row to another partition during a BEFORE trigger is not supported"),
+ errdetail("Before trigger \"%s\", row was to be in partition \"%s.%s\"",

In the error message you removed above, we are mentioning BEFORE FOR EACH ROW
trigger. Should we continue to use the same terminology?

I was wondering whether it would be good to check the partition constraint only
once i.e. after all before row triggers have been executed. This would avoid
throwing an error in case multiple triggers together worked to keep the tuple
in the same partition when individual trigger/s caused it to move out of that
partition. But then we would loose the opportunity to point out the before row
trigger which actually caused the row to move out of the partition. Anyway,
wanted to bring that for the discussion here.

@@ -307,7 +307,7 @@ CreatePartitionDirectory(MemoryContext mcxt)
*
* The purpose of this function is to ensure that we get the same
* PartitionDesc for each relation every time we look it up.  In the
- * face of current DDL, different PartitionDescs may be constructed with
+ * face of concurrent DDL, different PartitionDescs may be constructed with

Thanks for catching it. Looks unrelated though.

+-- Before triggers and partitions

The test looks good. Should we add a test for partitioned table with partition
key as expression?

The approach looks good to me.

--
Best Wishes,
Ashutosh

--
Best Wishes,
Ashutosh Bapat

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ashutosh Bapat (#7)
Re: BEFORE ROW triggers for partitioned tables

On 2020-Mar-17, Ashutosh Bapat wrote:

On Fri, 13 Mar 2020 at 21:55, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Note that in this implementation I no longer know which column is the
problematic one, but I suppose users have clue enough. Wording
suggestions welcome.

When we have expression as a partition key, there may not be one particular
column which causes the row to move out of partition. So, this should be
fine.

True.

A slight wording suggestion below.

- * Complain if we find an unexpected trigger type.
- */
- if (!TRIGGER_FOR_AFTER(trigForm->tgtype))
- elog(ERROR, "unexpected trigger \"%s\" found",
- NameStr(trigForm->tgname));

!AFTER means INSTEAD OF and BEFORE. Do you intend to allow INSTEAD OF
triggers as well?

Hmm, yeah, this should check both types; I'll put it back. Note that
this is just a cross-check that the catalogs we're going to copy don't
contain bogus info; the real backstop for that at the user level is in
the other one you complain about:

- */
- if (stmt->timing != TRIGGER_TYPE_AFTER)

Same comment as the above?

Note that in this one we have a check for INSTEAD before we enter the
FOR EACH ROW block, so this case is already covered -- AFAICS the code
is correct.

+ /*
+ * After a tuple in a partition goes through a trigger, the user
+ * could have changed the partition key enough that the tuple
+ * no longer fits the partition.  Verify that.
+ */
+ if (trigger->tgisclone &&

Why do we want to restrict this check only for triggers which are
cloned from the ancestors?

Because it's not our business in the other case. When the trigger is
defined directly in the partition, it's the user's problem if something
going amiss.

+ !ExecPartitionCheck(relinfo, slot, estate, false))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("moving row to another partition during a BEFORE trigger is not
supported"),
+ errdetail("Before trigger \"%s\", row was to be in partition \"%s.%s\"",

In the error message you removed above, we are mentioning BEFORE FOR EACH
ROW trigger. Should we continue to use the same terminology?

Sounds good, I'll change that.

I also changed the errdetail slightly:
errdetail("Before executing trigger \"%s\", the row was to be in partition \"%s.%s\"",

I was wondering whether it would be good to check the partition
constraint only once i.e. after all before row triggers have been
executed. This would avoid throwing an error in case multiple triggers
together worked to keep the tuple in the same partition when
individual trigger/s caused it to move out of that partition. But then
we would loose the opportunity to point out the before row trigger
which actually caused the row to move out of the partition. Anyway,
wanted to bring that for the discussion here.

Yeah, I too thought about a combination of triggers that move the tuple
elsewhere and back. Frankly, I don't think we need to support that. It
sounds devious and likely we'll miss some odd corner case -- anything
involving the weird cross-partition UPDATE mechanism sounds easy to get
wrong.

+-- Before triggers and partitions

The test looks good. Should we add a test for partitioned table with
partition
key as expression?

Will do.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#9)
Re: BEFORE ROW triggers for partitioned tables

Thanks for the reviews; I have pushed it now.

(I made the doc edits you mentioned too.)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11李杰(慎追)
adger.lj@alibaba-inc.com
In reply to: Alvaro Herrera (#10)
回复:BEFORE ROW triggers for partitioned tables

Hi commiter,

Many of our customers expect to use BR triggers in partitioned tables.
After I followed your discussion, I also checked your patch.
Here are two questions confusing me:

1. Your modification removes the check BR triggers against partitioned table,
and a more friendly error message is added to the ExecInsert and ExecUpdate.
You are correct. ExecInsert does not reroute partitions.
However, when ExecUpdate modifies partition keys,
it is almost equivalent to ExecDelete and ExecInsert,
and it is re-routed(ExecPrepareTupleRouting) once before ExecInsert. Therefore,
why should an error be thrown in ExecUpdate?
Let's look at a case :
```
postgres=# create table parted (a int, b int, c text) partition by list (a);
CREATE TABLE
postgres=# create table parted_1 partition of parted for values in (1);
CREATE TABLE
postgres=# create table parted_2 partition of parted for values in (2);
CREATE TABLE
postgres=# create function parted_trigfunc() returns trigger language plpgsql as $$
begin
new.a = new.a + 1;
return new;
end;
$$;
CREATE FUNCTION
postgres=# insert into parted values (1, 1, 'uno uno v1');
INSERT 0 1
postgres=# create trigger t before update on parted
for each row execute function parted_trigfunc();
CREATE TRIGGER
postgres=# update parted set c = c || 'v3';
```
If there is no check in the ExecUpdate,
the above update SQL will be executed successfully.
However, in your code, this will fail.
So, what is the reason for your consideration?

2. In this patch, you only removed the restrictions BR trigger against
the partitioned table, but did not fundamentally solve the problem caused
by modifying partition keys in the BR trigger. What are the difficulties in
solving this problem fundamentally? We plan to implement it.
Can you give us some suggestions?

------------------------------------------------------------------
发件人:Alvaro Herrera <alvherre@2ndquadrant.com>
发送时间:2021年1月18日(星期一) 20:36
收件人:Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>
抄 送:Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>; Pg Hackers <pgsql-hackers@lists.postgresql.org>
主 题:Re: BEFORE ROW triggers for partitioned tables

Thanks for the reviews; I have pushed it now.

(I made the doc edits you mentioned too.)

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12李杰(慎追)
adger.lj@alibaba-inc.com
In reply to: Alvaro Herrera (#10)
回复:BEFORE ROW triggers for partitioned tables

Hi commiter,

Many of our customers expect to use BR triggers in partitioned tables.
After I followed your discussion, I also checked your patch.
Here are two questions confusing me:

1. Your modification removes the check BR triggers against partitioned table,
and a more friendly error message is added to the ExecInsert and ExecUpdate.
You are correct. ExecInsert does not reroute partitions.
However, when ExecUpdate modifies partition keys,
it is almost equivalent to ExecDelete and ExecInsert,
and it is re-routed(ExecPrepareTupleRouting) once before ExecInsert. Therefore,
why should an error be thrown in ExecUpdate?
Let's look at a case :
```
postgres=# create table parted (a int, b int, c text) partition by list (a);
CREATE TABLE
postgres=# create table parted_1 partition of parted for values in (1);
CREATE TABLE
postgres=# create table parted_2 partition of parted for values in (2);
CREATE TABLE
postgres=# create function parted_trigfunc() returns trigger language plpgsql as $$
begin
new.a = new.a + 1;
return new;
end;
$$;
CREATE FUNCTION
postgres=# insert into parted values (1, 1, 'uno uno v1');
INSERT 0 1
postgres=# create trigger t before update on parted
for each row execute function parted_trigfunc();
CREATE TRIGGER
postgres=# update parted set c = c || 'v3';
```
If there is no check in the ExecUpdate,
the above update SQL will be executed successfully.
However, in your code, this will fail.
So, what is the reason for your consideration?

2. In this patch, you only removed the restrictions BR trigger against
the partitioned table, but did not fundamentally solve the problem caused
by modifying partition keys in the BR trigger. What are the difficulties in
solving this problem fundamentally? We plan to implement it.
Can you give us some suggestions?

We look forward to your reply.
Thank you very much,
Regards, Adger

------------------------------------------------------------------
发件人:Alvaro Herrera <alvherre@2ndquadrant.com>
发送时间:2021年1月18日(星期一) 20:36
收件人:Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>
抄 送:Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>; Pg Hackers <pgsql-hackers@lists.postgresql.org>
主 题:Re: BEFORE ROW triggers for partitioned tables

Thanks for the reviews; I have pushed it now.

(I made the doc edits you mentioned too.)

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services