PG10 transition tables, wCTEs and multiple operations on the same table

Started by Marko Tiikkajaover 8 years ago59 messages
#1Marko Tiikkaja
marko@joh.to

Since the subject of transition tables came up, I thought I'd test how this
case works:

=# create table qwr(a int);
CREATE TABLE

=# create function qq() returns trigger as $$ begin raise notice '%',
(select count(*) from oo); return null; end $$ language plpgsql;
CREATE FUNCTION

=# create trigger xx after insert on qwr referencing new table as oo for
each statement execute procedure qq();
CREATE TRIGGER

=# with t as (insert into qwr values (1)) insert into qwr values (2), (3);
NOTICE: 3
NOTICE: 3
INSERT 0 2

to me, this means that it doesn't work. Surely one of the trigger
invocations should say 1, and the other 2. Or was this intentional?

.m

#2Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Marko Tiikkaja (#1)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Fri, Jun 2, 2017 at 10:48 PM, Marko Tiikkaja <marko@joh.to> wrote:

Since the subject of transition tables came up, I thought I'd test how this
case works:

=# create table qwr(a int);
CREATE TABLE

=# create function qq() returns trigger as $$ begin raise notice '%',
(select count(*) from oo); return null; end $$ language plpgsql;
CREATE FUNCTION

=# create trigger xx after insert on qwr referencing new table as oo for
each statement execute procedure qq();
CREATE TRIGGER

=# with t as (insert into qwr values (1)) insert into qwr values (2), (3);
NOTICE: 3
NOTICE: 3
INSERT 0 2

to me, this means that it doesn't work. Surely one of the trigger
invocations should say 1, and the other 2. Or was this intentional?

I would have expected that to be handled by separate transition tables
at different query levels, but evidently it isn't. The following
crashes:

create table table1(a int);
create table table2(a text);

create function trigfunc() returns trigger as $$ begin raise notice
'got: %', (select oo from oo); return null; end $$ language plpgsql;

create trigger trig1 after insert on table1 referencing new table as
oo for each statement execute procedure trigfunc();

create trigger trig2 after insert on table2 referencing new table as
oo for each statement execute procedure trigfunc();

with t as (insert into table1 values (1)) insert into table2 values ('hello');

I've run out of day today but will investigate.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thomas Munro (#2)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Sat, Jun 3, 2017 at 12:10 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I would have expected that to be handled by separate transition tables
at different query levels, but evidently it isn't.

I am wondering if we need to wrap the execution of a modifying CTE in
AfterTriggerBeginQuery() and AfterTriggerEndQuery(). But I'm not sure
where, and it may be a couple of days before I can dig some more.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thomas Munro (#3)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Sat, Jun 3, 2017 at 1:20 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Sat, Jun 3, 2017 at 12:10 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I would have expected that to be handled by separate transition tables
at different query levels, but evidently it isn't.

I am wondering if we need to wrap the execution of a modifying CTE in
AfterTriggerBeginQuery() and AfterTriggerEndQuery(). But I'm not sure
where, and it may be a couple of days before I can dig some more.

So, afterTriggers.query_stack is used to handle the reentrancy that
results from triggers running further statements that might fire
triggers. It isn't used for dealing with extra ModifyTable nodes that
can appear in a plan because of wCTEs. Could it also be used for that
purpose? I think that would only work if it is the case that each
ModifyTable node begin and then runs to completion (ie no interleaving
of wCTE execution) and then its AS trigger fires, which I'm not sure
about. If that is the case, perhaps AfterTriggerBeginQuery and
AfterTriggerEndQuery could become the responsibility of
nodeModifyTable.c rather than execMain.c. I haven't tried this yet
and it may well be too ambitious at this stage.

Other ideas: (1) ban wCTEs that target relations with transition
tables in PG10, because we're out of time; (2) find an entirely new
way to keep track of the current active transition table(s) to replace
the current stack approach, such as including them in the
TransitionCaptureState object in the patch that I proposed to fix the
nearby inheritance bugs[1]/messages/by-id/CA+TgmoZzTBBAsEUh4MazAN7ga=8SsMC-Knp-6cetts9yNZUCcg@mail.gmail.com.

Stepping back and squinting a bit, both this and the inheritance bug
stem from a failure to handle multiple ModifyTable nodes in a plan,
but the solutions are approximately opposite: in the inheritance case,
the solution I propose is to teach them all to coordinate their tuple
capture so that it's in a common format, and in the wCTE case the
solution must surely be to ensure that their state is kept separate.
The question I'd like to figure out is whether the existing
AfterTriggerBeginQuery/AfterTriggerEndQuery stack is the right data
structure for that task, considering the control flow when CTEs are
executed, which I need to learn more about.

Thoughts?

[1]: /messages/by-id/CA+TgmoZzTBBAsEUh4MazAN7ga=8SsMC-Knp-6cetts9yNZUCcg@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Kevin Grittner
kgrittn@gmail.com
In reply to: Thomas Munro (#4)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

So, afterTriggers.query_stack is used to handle the reentrancy that
results from triggers running further statements that might fire
triggers. It isn't used for dealing with extra ModifyTable nodes that
can appear in a plan because of wCTEs. Could it also be used for that
purpose? I think that would only work if it is the case that each
ModifyTable node begin and then runs to completion (ie no interleaving
of wCTE execution) and then its AS trigger fires, which I'm not sure
about.

I don't think we want to commit to anything that depends on a CTE
creating an optimization fence, although *maybe* that would be OK in
the case of DML as a CTE. That's a pretty special case; I'm not
sure whether the standard discusses it.

If that is the case, perhaps AfterTriggerBeginQuery and
AfterTriggerEndQuery could become the responsibility of
nodeModifyTable.c rather than execMain.c. I haven't tried this yet
and it may well be too ambitious at this stage.

In a world where one statement can contain multiple DML statements
within CTEs, that may make sense regardless of transition table
issues; but I agree we're past the time to be considering making
such a change for v10.

Other ideas: (1) ban wCTEs that target relations with transition
tables in PG10, because we're out of time;

Given that we haven't even discussed what to do about an UPDATE
statement with a CTE that updates the same table when there are
BEFORE EACH ROW UPDATE triggers on that table (which perhaps return
NULL for some but not all rows?), I'm not sure we've yet plumbed the
depths of this morass.

For example, for this:

drop table if exists t1 cascade;

create table t1 (c1 int not null, c2 text not null default '');
insert into t1 (c1) values (1), (2), (3), (4), (5);

drop function if exists t1_upd_func() cascade;
create function t1_upd_func()
returns trigger
language plpgsql
as $$
begin
if old.c1 between 2 and 4 then
new.c2 = old.c2 || ';' || old.c1::text || ',' || new.c1::text;
end if;
if old.c1 > 3 then
return null;
end if;
return new;
end;
$$;
create trigger t1_upd_update
before update
on t1
for each row execute procedure t1_upd_func();
with x as (update t1 set c1 = c1 - 1 returning c1)
update t1 set c1 = t1.c1 + 1 from x where x.c1 = t1.c1;
select * from t1;

... what is the correct result?

I get this:

c1 | c2
----+------
4 |
5 |
0 |
1 | ;2,1
2 | ;3,2
(5 rows)

It is definitely not what I expected, and seems wrong in multiple
ways from a logical perspective, looking at those as set-based
operations. (Tautologies about the procedural mechanism that
creates the result are not defenses of the result itself.)

Can we fix things exclusive of transition tables in this release?
If not, the only reasonable thing to do is to try to avoid further
complicating things with CTE/transition table usage until we sort
out the basics of triggers firing from CTE DML in the first place.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Kevin Grittner (#5)
1 attachment(s)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Sun, Jun 4, 2017 at 10:41 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

So, afterTriggers.query_stack is used to handle the reentrancy that
results from triggers running further statements that might fire
triggers. It isn't used for dealing with extra ModifyTable nodes that
can appear in a plan because of wCTEs. Could it also be used for that
purpose? I think that would only work if it is the case that each
ModifyTable node begin and then runs to completion (ie no interleaving
of wCTE execution) and then its AS trigger fires, which I'm not sure
about.

I don't think we want to commit to anything that depends on a CTE
creating an optimization fence, although *maybe* that would be OK in
the case of DML as a CTE. That's a pretty special case; I'm not
sure whether the standard discusses it.

According to our manual the standard doesn't allow DML in CTEs. I
suppose it wouldn't make much sense without RETURNING, which is also
non-standard.

If that is the case, perhaps AfterTriggerBeginQuery and
AfterTriggerEndQuery could become the responsibility of
nodeModifyTable.c rather than execMain.c. I haven't tried this yet
and it may well be too ambitious at this stage.

In a world where one statement can contain multiple DML statements
within CTEs, that may make sense regardless of transition table
issues; but I agree we're past the time to be considering making
such a change for v10.

Ok.

Other ideas: (1) ban wCTEs that target relations with transition
tables in PG10, because we're out of time;

Given that we haven't even discussed what to do about an UPDATE
statement with a CTE that updates the same table when there are
BEFORE EACH ROW UPDATE triggers on that table (which perhaps return
NULL for some but not all rows?), I'm not sure we've yet plumbed the
depths of this morass.

[...]

Can we fix things exclusive of transition tables in this release?
If not, the only reasonable thing to do is to try to avoid further
complicating things with CTE/transition table usage until we sort
out the basics of triggers firing from CTE DML in the first place.

At least it's well documented that the execution order is undefined,
but yeah triggers do make things a lot more complicated and I'm not
sure I understand to what extent that is unavoidable once you decide
to allow DML in CTEs.

In the meantime, it seems like you agree that rejecting wCTEs that
affect tables with triggers with transition tables is the best
response to this bug report? Do you think that parse analysis is the
right time to do the check? Here's a first attempt at that.

Later, for the next cycle, I think we should investigate storing the
transition tables in ModifyTableState rather than in query_stack, and
then passing them to the Exec*Triggers() functions as arguments
(possibly inside TransitionCaptureState, if you accept my proposal for
the inheritance bug).

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

reject-wcte-with-transition-tables.patchapplication/octet-stream; name=reject-wcte-with-transition-tables.patchDownload
diff --git a/src/backend/parser/parse_cte.c b/src/backend/parser/parse_cte.c
index dfbcaa2cdcc..7d5bdb9a7d9 100644
--- a/src/backend/parser/parse_cte.c
+++ b/src/backend/parser/parse_cte.c
@@ -14,13 +14,16 @@
  */
 #include "postgres.h"
 
+#include "access/heapam.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/analyze.h"
+#include "parser/parsetree.h"
 #include "parser/parse_cte.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/rel.h"
 
 
 /* Enumeration of contexts in which a self-reference is disallowed */
@@ -265,6 +268,35 @@ analyzeCTE(ParseState *pstate, CommonTableExpr *cte)
 				 parser_errposition(pstate, cte->location)));
 
 	/*
+	 * We disallow data-modifying WITH on tables that have triggers with
+	 * transition tables, because we don't yet have separate transition tables
+	 * for them.  This restriction will be lifted in a future release.
+	 */
+	if (query->commandType == CMD_INSERT ||
+		query->commandType == CMD_UPDATE ||
+		query->commandType == CMD_DELETE)
+	{
+		RangeTblEntry *rte = rt_fetch(query->resultRelation, query->rtable);
+
+		if (rte->rtekind == RTE_RELATION)
+		{
+			Relation rel = heap_open(rte->relid, NoLock); /* lock held by rte */
+
+			if (rel->trigdesc != NULL &&
+				(rel->trigdesc->trig_insert_new_table ||
+				 rel->trigdesc->trig_update_old_table ||
+				 rel->trigdesc->trig_update_new_table ||
+				 rel->trigdesc->trig_delete_old_table))
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("WITH clause cannot modify a table that has triggers with transition tables"),
+						 parser_errposition(pstate, cte->location)));
+
+			heap_close(rel, NoLock);
+		}
+	}
+
+	/*
 	 * CTE queries are always marked not canSetTag.  (Currently this only
 	 * matters for data-modifying statements, for which the flag will be
 	 * propagated to the ModifyTable plan node.)
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 0d560fb3eed..112a74c1ff8 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -1892,4 +1892,18 @@ NOTICE:  trigger on parted_stmt_trig AFTER INSERT for STATEMENT
 copy parted_stmt_trig1(a) from stdin;
 NOTICE:  trigger on parted_stmt_trig1 BEFORE INSERT for ROW
 NOTICE:  trigger on parted_stmt_trig1 AFTER INSERT for ROW
+--
+-- Verify that wCTEs are not (yet) allowed to modify tables with
+-- triggers with transition tables.
+--
+create trigger trig_del_after_transition after delete on parted_stmt_trig1
+  referencing old table as old_table
+  for each statement execute procedure trigger_notice();
+with del as (
+  delete from parted_stmt_trig1 returning *
+)
+select;
+ERROR:  WITH clause cannot modify a table that has triggers with transition tables
+LINE 1: with del as (
+             ^
 drop table parted_stmt_trig, parted2_stmt_trig;
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 5581fcb1648..62c0b8aef24 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1359,4 +1359,18 @@ copy parted_stmt_trig1(a) from stdin;
 1
 \.
 
+--
+-- Verify that wCTEs are not (yet) allowed to modify tables with
+-- triggers with transition tables.
+--
+
+create trigger trig_del_after_transition after delete on parted_stmt_trig1
+  referencing old table as old_table
+  for each statement execute procedure trigger_notice();
+
+with del as (
+  delete from parted_stmt_trig1 returning *
+)
+select;
+
 drop table parted_stmt_trig, parted2_stmt_trig;
#7Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#6)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Sat, Jun 3, 2017 at 10:39 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

In the meantime, it seems like you agree that rejecting wCTEs that
affect tables with triggers with transition tables is the best
response to this bug report? Do you think that parse analysis is the
right time to do the check? Here's a first attempt at that.

I'm starting to like the approach of reverting the entire transition
tables patch. Failing to consider the possibility of a plan with
multiple ModifyTable nodes seems like a pretty fundamental design
mistake, and I'm not eager either to ship this with that broken or try
to fix it at this stage of the release cycle.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Jun 3, 2017 at 10:39 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

In the meantime, it seems like you agree that rejecting wCTEs that
affect tables with triggers with transition tables is the best
response to this bug report? Do you think that parse analysis is the
right time to do the check? Here's a first attempt at that.

FWIW, parse analysis is surely NOT the time for such a check. Triggers
might get added to a table between analysis and execution. I think you
might have to do it during executor startup.

I'm starting to like the approach of reverting the entire transition
tables patch. Failing to consider the possibility of a plan with
multiple ModifyTable nodes seems like a pretty fundamental design
mistake, and I'm not eager either to ship this with that broken or try
to fix it at this stage of the release cycle.

Postponing the feature to v11 might be a viable solution. We don't
have any other major work that depends on it do we?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Mon, Jun 5, 2017 at 10:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm starting to like the approach of reverting the entire transition
tables patch. Failing to consider the possibility of a plan with
multiple ModifyTable nodes seems like a pretty fundamental design
mistake, and I'm not eager either to ship this with that broken or try
to fix it at this stage of the release cycle.

Postponing the feature to v11 might be a viable solution. We don't
have any other major work that depends on it do we?

I don't think anything that depends on it was committed to v10.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Tom Lane (#8)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Tue, Jun 6, 2017 at 2:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Jun 3, 2017 at 10:39 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

In the meantime, it seems like you agree that rejecting wCTEs that
affect tables with triggers with transition tables is the best
response to this bug report? Do you think that parse analysis is the
right time to do the check? Here's a first attempt at that.

FWIW, parse analysis is surely NOT the time for such a check. Triggers
might get added to a table between analysis and execution. I think you
might have to do it during executor startup.

Hmm. My understanding: In analyzeCTE(), where the check is done in
that patch, we hold a lock on the relation because we have a
RangeTblEntry. That means that no triggers can be added concurrently
in the case where you go on to execute the plan. If you save the plan
for later execution, triggers created between then and lock
reacquisition (because creating triggers touches pg_class) cause
reanalysis. This is not fundamentally different from altering the
table. For example, with that patch:

postgres=# prepare ps as with t as (insert into table1 values (1))
insert into table2 values ('hello');
PREPARE
postgres=# create trigger trig1 after insert on table1 referencing new
table as oo for each row execute procedure trigfunc();
CREATE TRIGGER
postgres=# execute ps;
ERROR: WITH clause cannot modify a table that has triggers with
transition tables

What am I missing?

I'm starting to like the approach of reverting the entire transition
tables patch. Failing to consider the possibility of a plan with
multiple ModifyTable nodes seems like a pretty fundamental design
mistake, and I'm not eager either to ship this with that broken or try
to fix it at this stage of the release cycle.

Not handling wCTEs or inheritance definitely counts as a howler, but
we seem tantalisingly close and it would be a shame to push the whole
feature back IMHO. I really hope we can move on to talking about
incremental matviews in the next cycle, which is why I've jumped on
every problem report so far.

I'm still waiting to hear from Kevin whether he thinks what I proposed
for the inheritance bug has legs. If so, my vote from the peanut
gallery would be to keep transition tables in the tree but block wCTEs
with transition tables, and then add support in the next release
(which I'd be happy to work on). Support will certainly be needed
ASAP, because it'd suck if incremental matview maintenance didn't work
just because you use wCTEs, and the 'action at a distance' factor
isn't a great user experience (adding a new trigger can in general
break existing queries, but breaking them just because they use wCTEs
is arguably surprising). On the other hand, if he thinks that the
inheritance bug is not adequately fixed by my proposal (with minor
tweaks) and needs a completely different approach, then I'm out (but
will be happy to help work on this stuff for PG11).

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#10)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

Thomas Munro <thomas.munro@enterprisedb.com> writes:

On Tue, Jun 6, 2017 at 2:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

FWIW, parse analysis is surely NOT the time for such a check. Triggers
might get added to a table between analysis and execution. I think you
might have to do it during executor startup.

Hmm. My understanding: In analyzeCTE(), where the check is done in
that patch, we hold a lock on the relation because we have a
RangeTblEntry. That means that no triggers can be added concurrently
in the case where you go on to execute the plan. If you save the plan
for later execution, triggers created between then and lock
reacquisition (because creating triggers touches pg_class) cause
reanalysis.

No, they don't necessarily. One case that will certainly not be
covered by such a test is if the wCTE is an UPDATE or DELETE on
an inheritance hierarchy: parse analysis has no idea whether the
target table has children at all, let alone whether any of the
children have triggers that use transition tables. You really want
the test to happen after the planner has expanded inheritance.

Another problem is that the result of parse analysis is frozen for much
longer than you're thinking about in the case of stored rules or views.
We currently disallow views that contain writable CTEs, but I do not
think there's such a prohibition for rules.

More generally, the problem I've got with this proposal is that it causes
the presence of triggers to be significant much earlier in query execution
than they used to be. I'm concerned that this may break optimizations
that are already in place somewhere, or that we might want to add later.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Tom Lane (#11)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Tue, Jun 6, 2017 at 10:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

On Tue, Jun 6, 2017 at 2:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

FWIW, parse analysis is surely NOT the time for such a check. Triggers
might get added to a table between analysis and execution. I think you
might have to do it during executor startup.

Hmm. My understanding: In analyzeCTE(), where the check is done in
that patch, we hold a lock on the relation because we have a
RangeTblEntry. That means that no triggers can be added concurrently
in the case where you go on to execute the plan. If you save the plan
for later execution, triggers created between then and lock
reacquisition (because creating triggers touches pg_class) cause
reanalysis.

No, they don't necessarily. One case that will certainly not be
covered by such a test is if the wCTE is an UPDATE or DELETE on
an inheritance hierarchy: parse analysis has no idea whether the
target table has children at all, let alone whether any of the
children have triggers that use transition tables. You really want
the test to happen after the planner has expanded inheritance.

If Kevin accepts my other patch or at least the approach, we won't
allow any child table triggers with transition tables, so we wouldn't
need to consider them here (triggers on the named result table see
tuples collected from children, but ROW triggers attached to children
can't have transition tables and STATEMENT triggers attached to
children don't fire). Though those decisions probably wouldn't be
future-proof, we'd only want this block in place until we had support
for wCTEs.

Another problem is that the result of parse analysis is frozen for much
longer than you're thinking about in the case of stored rules or views.
We currently disallow views that contain writable CTEs, but I do not
think there's such a prohibition for rules.

Ugh. Right. I can indeed circumvent the above check with a wCTE
hiding in a rule, so parse analysis is not the right time for the
check. The whole rewrite area is clearly something I need to go and
learn about. Thanks for the explanation.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Craig Ringer
craig@2ndquadrant.com
In reply to: Kevin Grittner (#5)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On 4 June 2017 at 06:41, Kevin Grittner <kgrittn@gmail.com> wrote:

On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

So, afterTriggers.query_stack is used to handle the reentrancy that
results from triggers running further statements that might fire
triggers. It isn't used for dealing with extra ModifyTable nodes that
can appear in a plan because of wCTEs. Could it also be used for that
purpose? I think that would only work if it is the case that each
ModifyTable node begin and then runs to completion (ie no interleaving
of wCTE execution) and then its AS trigger fires, which I'm not sure
about.

I don't think we want to commit to anything that depends on a CTE
creating an optimization fence, although *maybe* that would be OK in
the case of DML as a CTE. That's a pretty special case; I'm not
sure whether the standard discusses it.

It's definitely fine to require a fence for wCTEs. They're an
extension to the standard, and it's pretty much necessary that we not
pull up / push down across them since we don't want to affect the
side-effects (row changes). If there are any cases where it's safe,
they'll take some careful thought.

It's only standard CTEs (SELECT-based) that I think matter for the
optimisation fence behaviour.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Craig Ringer (#13)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Tue, Jun 6, 2017 at 12:58 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 4 June 2017 at 06:41, Kevin Grittner <kgrittn@gmail.com> wrote:

On Fri, Jun 2, 2017 at 8:46 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

So, afterTriggers.query_stack is used to handle the reentrancy that
results from triggers running further statements that might fire
triggers. It isn't used for dealing with extra ModifyTable nodes that
can appear in a plan because of wCTEs. Could it also be used for that
purpose? I think that would only work if it is the case that each
ModifyTable node begin and then runs to completion (ie no interleaving
of wCTE execution) and then its AS trigger fires, which I'm not sure
about.

I don't think we want to commit to anything that depends on a CTE
creating an optimization fence, although *maybe* that would be OK in
the case of DML as a CTE. That's a pretty special case; I'm not
sure whether the standard discusses it.

It's definitely fine to require a fence for wCTEs. They're an
extension to the standard, and it's pretty much necessary that we not
pull up / push down across them since we don't want to affect the
side-effects (row changes). If there are any cases where it's safe,
they'll take some careful thought.

It's only standard CTEs (SELECT-based) that I think matter for the
optimisation fence behaviour.

After sleeping on it, I don't think we need to make that decision here
though. I think it's better to just move the tuplestores into
ModifyTableState so that each embedded DML statement has its own, and
have ModifyTable pass them to the trigger code explicitly. I think
I'd like to do that via the TransitionCaptureState object that I
proposed elsewhere, but I'll hold off on doing anything until I hear
from interested committers on which way we're going here, time being
short.

Call me an anti-globalisation (of variables) protestor.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Kevin Grittner
kgrittn@gmail.com
In reply to: Thomas Munro (#14)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

Nice as it would be to add a SQL standard feature and advance the
effort to get to incremental maintenance of materialized views, and
much as I really appreciate the efforts Thomas has put into trying
to solve these problems, I agree that it is best to revert the
feature. It took years to get an in-depth review, then I was asked
not to commit it because others were working on patches that would
conflict. That just doesn't leave enough time to address these
issues before release. Fundamentally, I'm not sure that there is a
level of interest sufficient to support the effort.

I'll give it a few days for objections before reverting.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Marko Tiikkaja
marko@joh.to
In reply to: Kevin Grittner (#15)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Tue, Jun 6, 2017 at 10:25 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

Nice as it would be to add a SQL standard feature and advance the
effort to get to incremental maintenance of materialized views, and
much as I really appreciate the efforts Thomas has put into trying
to solve these problems, I agree that it is best to revert the
feature. It took years to get an in-depth review, then I was asked
not to commit it because others were working on patches that would
conflict. That just doesn't leave enough time to address these
issues before release. Fundamentally, I'm not sure that there is a
level of interest sufficient to support the effort.

I'll give it a few days for objections before reverting.

I can only say that the lack of this feature comes up on a weekly basis on
IRC, and a lot of people would be disappointed to see it reverted.

.m

#17Kevin Grittner
kgrittn@gmail.com
In reply to: Marko Tiikkaja (#16)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Tue, Jun 6, 2017 at 3:41 PM, Marko Tiikkaja <marko@joh.to> wrote:

On Tue, Jun 6, 2017 at 10:25 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

It took years to get an in-depth review, then I was asked
not to commit it because others were working on patches that would
conflict. That just doesn't leave enough time to address these
issues before release. Fundamentally, I'm not sure that there is a
level of interest sufficient to support the effort.

I can only say that the lack of this feature comes up on a weekly basis on
IRC, and a lot of people would be disappointed to see it reverted.

Well, at PGCon I talked with someone who worked on the
implementation in Oracle 20-some years ago. He said they had a team
of 20 people full time working on the feature for years to get it
working. Now, I think the PostgreSQL community is a little lighter
on its feet, but without more active participation from others than
there has been so far, I don't intend to take another run at it.
I'll be happy to participate at such point that it's not treated as
a second-class feature set. Barring that, anyone else who wants to
take the patch and run with it is welcome to do so.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#15)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Tue, Jun 6, 2017 at 4:25 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

Nice as it would be to add a SQL standard feature and advance the
effort to get to incremental maintenance of materialized views, and
much as I really appreciate the efforts Thomas has put into trying
to solve these problems, I agree that it is best to revert the
feature. It took years to get an in-depth review, then I was asked
not to commit it because others were working on patches that would
conflict. That just doesn't leave enough time to address these
issues before release. Fundamentally, I'm not sure that there is a
level of interest sufficient to support the effort.

I find it a little unfair that you are blaming the community for not
taking enough interest in this feature when other people are posting
bug fixes for the feature which you can't find time to review (or even
respond to with an i'm-too-busy email) for lengthy periods of time.
Previously, I took the time to review and commit fixes for several
reported fixes which Thomas developed, while you ignored all of the
relevant threads. Now, he's posted another patch which you said you
would review by last Friday or at the latest by Monday and which you
have not reviewed. He's also suggested that the fix for this issue
probably relies on that one, so getting that patch reviewed is
relevant to this thread also.

I think I'd like to walk back my earlier statements about reverting
this patch just a little bit. Although putting the tuplestore at the
wrong level does seem like a fairly significant design mistake, Thomas
more or less convinced me yesterday on a Skype call that relocating it
to the ModifyTable node might not be that much work. If it's a
150-line patch, it'd probably be less disruptive to install a fix than
to revert the whole thing (and maybe put it back in again, in some
future release). On the other hand, if you aren't willing to write,
review, or commit any further patches to fix bugs in this feature,
then it can really only stay in the tree if somebody else is willing
to assume completely responsibility for it going forward.

So, are you willing and able to put any effort into this, like say
reviewing the patch Thomas posted, and if so when and how much? If
you're just done and you aren't going to put any more work into
maintaining it (for whatever reasons), then I think you should say so
straight out. People shouldn't have to guess whether you're going to
maintain your patch or not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Thomas Munro (#14)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Mon, Jun 5, 2017 at 6:40 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

After sleeping on it, I don't think we need to make that decision here
though. I think it's better to just move the tuplestores into
ModifyTableState so that each embedded DML statement has its own, and
have ModifyTable pass them to the trigger code explicitly.

I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
case -- one for updated tuples, and the other for inserted tuples.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Peter Geoghegan (#19)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Tue, Jun 6, 2017 at 3:47 PM, Peter Geoghegan <pg@bowt.ie> wrote:

I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
case -- one for updated tuples, and the other for inserted tuples.

Also, ISTM that the code within ENRMetadataGetTupDesc() probably
requires more explanation, resource management wise.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Peter Geoghegan (#20)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan <pg@bowt.ie> wrote:

Also, ISTM that the code within ENRMetadataGetTupDesc() probably
requires more explanation, resource management wise.

Also, it's not clear why it should be okay that the new type of
ephemeral RTEs introduced don't have permissions checks. There are
currently cases where the user cannot see data that they inserted
themselves (e.g., through RETURNING), on the theory that a before row
trigger might have modified the final contents of the tuple in a way
that the original inserter isn't supposed to know details about.

As the INSERT docs say, "Use of the RETURNING clause requires SELECT
privilege on all columns mentioned in RETURNING". Similarly, the
excluded.* pseudo-relation requires select privilege (on the
corresponding target relation columns) in order to be usable by ON
CONFLICT DO UPDATE.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Adam Brusselback
adambrusselback@gmail.com
In reply to: Marko Tiikkaja (#16)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

I'll give it a few days for objections before reverting.

I can only say that the lack of this feature comes up on a weekly basis on
IRC, and a lot of people would be disappointed to see it reverted.

Not that my opinion matters, but I was very much looking forward to this
feature in Postgres 10, but I understand if it just won't be stable enough
to stay in.

I have my fingers crossed these issues can be resolved.

#23Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Robert Haas (#18)
1 attachment(s)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Wed, Jun 7, 2017 at 9:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I think I'd like to walk back my earlier statements about reverting
this patch just a little bit. Although putting the tuplestore at the
wrong level does seem like a fairly significant design mistake, Thomas
more or less convinced me yesterday on a Skype call that relocating it
to the ModifyTable node might not be that much work. If it's a
150-line patch, it'd probably be less disruptive to install a fix than
to revert the whole thing (and maybe put it back in again, in some
future release).

I spent a couple of hours drafting a proof-of-concept to see if my
hunch was right. It seems to work correctly so far and isn't huge
(but certainly needs more testing and work):

6 files changed, 156 insertions(+), 109 deletions(-)

It applies on top of the other patch[1]/messages/by-id/CAEepm=1dGNzh98Gt21fn_Ed6k20sVB-NuAARE1EF693itK6=Lg@mail.gmail.com. It extends
TransitionCaptureState to hold the the new and old tuplestores for
each ModifyTable node, instead of using global variables. The result
is that tuplestores don't get mixed up, and there aren't any weird
assumptions about the order of execution as discussed earlier.
Example:

with wcte as (insert into table1 values (42))
insert into table2 values ('hello world');
NOTICE: trigger = table2_trig, old table = <NULL>, new table =
("hello world")
NOTICE: trigger = table1_trig, old table = <NULL>, new table = (42)

Summary of how these patches relate:

1. In the inheritance patch[1]/messages/by-id/CAEepm=1dGNzh98Gt21fn_Ed6k20sVB-NuAARE1EF693itK6=Lg@mail.gmail.com, TransitionCaptureState is introduced.
It holds flags that control whether we capture tuples. There is one
of these per ModifyTable node. In master we use the flags in
TriggerDesc to control transition tuple capture directly, but we
needed a way for ModifyTable's result rel's TriggerDesc to affect all
child tables that are touched. My proposal is to do that by inventing
this new object to activate transition tuple capture while modifying
child tables too. It is passed into the ExecAR*Trigger() functions of
all relations touched by the ModifyTable node.

2. In the attached patch, that struct is extended to hold the actual
tuplestores. They are used for two purposes: ExecAR*Trigger()
captures tuples into them (instead of using global variables to find
the tuplestores to capture tuples into), and ExecA[RS]*Trigger() keeps
hold of the TransitionCaptureState in the after trigger queue so that
when the queued event is eventually executed AfterTriggerExecute() can
expose the correct tuplestores to triggers.

There are a couple of things that definitely need work and I'd welcome
any comments:

1. I added a pointer to TransitionCaptureState to AfterTriggerShared,
in order to record which tuplestores a queued after trigger event
should see. I suspected that enqueuing pointers like that wouldn't be
popular, and when I ran the idea past Andres on IRC he just said
"yuck" :-) Perhaps there needs to be a way to convert this into an
index into some array in EState, ... or something else. The basic
requirement is that the AfterTriggerExecute() needs to know *which*
tuplestores should be visible to the trigger when it runs. I believe
the object lifetime is sound (the TransitionCaptureState lasts until
ExecutorEnd(), and triggers are fired before that during
ExecutorFinish()).

2. I didn't think about what execReplication.c needs. Although that
code apparently doesn't know how to fire AS triggers, it does know how
to fire AR triggers (so that RI works?), and in theory those might
have transition tables, so I guess that needs to use
MakeTransitionCaptureState() -- but it seems to lack a place to keep
that around, and I ran out of time thinking about that today.

Thoughts?

[1]: /messages/by-id/CAEepm=1dGNzh98Gt21fn_Ed6k20sVB-NuAARE1EF693itK6=Lg@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

transition-tuples-from-wctes-v1.patchapplication/octet-stream; name=transition-tuples-from-wctes-v1.patchDownload
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 820799eaa90..8f760e50006 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1416,6 +1416,12 @@ BeginCopy(ParseState *pstate,
 					 errmsg("table \"%s\" does not have OIDs",
 							RelationGetRelationName(cstate->rel))));
 
+		/*
+		 * If there are any triggers with transition tables on the named
+		 * relation, we need to be prepared to capture transition tuples.
+		 */
+		cstate->transition_capture = MakeTransitionCaptureState(rel->trigdesc);
+
 		/* Initialize state for CopyFrom tuple routing. */
 		if (is_from && rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		{
@@ -1440,14 +1446,6 @@ BeginCopy(ParseState *pstate,
 			cstate->partition_tuple_slot = partition_tuple_slot;
 
 			/*
-			 * If there are any triggers with transition tables on the named
-			 * relation, we need to be prepared to capture transition tuples
-			 * from child relations too.
-			 */
-			cstate->transition_capture =
-				MakeTransitionCaptureState(rel->trigdesc);
-
-			/*
 			 * If we are capturing transition tuples, they may need to be
 			 * converted from partition format back to partitioned table
 			 * format (this is only ever necessary if a BEFORE trigger
@@ -2788,7 +2786,7 @@ CopyFrom(CopyState cstate)
 		pq_endmsgread();
 
 	/* Execute AFTER STATEMENT insertion triggers */
-	ExecASInsertTriggers(estate, resultRelInfo);
+	ExecASInsertTriggers(estate, resultRelInfo, cstate->transition_capture);
 
 	/* Handle queued AFTER triggers */
 	AfterTriggerEndQuery(estate);
@@ -2916,7 +2914,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 			cstate->cur_lineno = firstBufferedLineNo + i;
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 bufferedTuples[i],
-								 NIL, NULL);
+								 NIL, cstate->transition_capture);
 		}
 	}
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index d90f0474fa5..31bde8371a7 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2071,9 +2071,10 @@ FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc)
 /*
  * Make a TransitionCaptureState object from a given TriggerDesc.  The
  * resulting object holds the flags which control whether transition tuples
- * are collected when tables are modified.  This allows us to use the flags
- * from a parent table to control the collection of transition tuples from
- * child tables.
+ * are collected when tables are modified, and the tuplestores themselves.
+ * Note that we copy the flags from a parent table into this struct (rather
+ * than using each relation's TriggerDesc directly) so that we can use it to
+ * control the collection of transition tuples from child tables.
  *
  * If there are no triggers with transition tables configured for 'trigdesc',
  * then return NULL.
@@ -2091,17 +2092,49 @@ MakeTransitionCaptureState(TriggerDesc *trigdesc)
 		(trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table ||
 		 trigdesc->trig_update_new_table || trigdesc->trig_insert_new_table))
 	{
+		MemoryContext oldcxt;
+		ResourceOwner saveResourceOwner;
+
+		oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+		saveResourceOwner = CurrentResourceOwner;
+
 		state = (TransitionCaptureState *)
 			palloc0(sizeof(TransitionCaptureState));
 		state->tcs_delete_old_table = trigdesc->trig_delete_old_table;
 		state->tcs_update_old_table = trigdesc->trig_update_old_table;
 		state->tcs_update_new_table = trigdesc->trig_update_new_table;
 		state->tcs_insert_new_table = trigdesc->trig_insert_new_table;
+		PG_TRY();
+		{
+			CurrentResourceOwner = TopTransactionResourceOwner;
+			if (trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table)
+				state->tcs_old_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+			if (trigdesc->trig_insert_new_table || trigdesc->trig_update_new_table)
+				state->tcs_new_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+		}
+		PG_CATCH();
+		{
+			CurrentResourceOwner = saveResourceOwner;
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+		CurrentResourceOwner = saveResourceOwner;
+		MemoryContextSwitchTo(oldcxt);
 	}
 
 	return state;
 }
 
+void
+DestroyTransitionCaptureState(TransitionCaptureState *tcs)
+{
+	if (tcs->tcs_new_tuplestore != NULL)
+		tuplestore_end(tcs->tcs_new_tuplestore);
+	if (tcs->tcs_old_tuplestore != NULL)
+		tuplestore_end(tcs->tcs_old_tuplestore);
+	pfree(tcs);
+}
+
 /*
  * Call a trigger function.
  *
@@ -2260,13 +2293,14 @@ ExecBSInsertTriggers(EState *estate, ResultRelInfo *relinfo)
 }
 
 void
-ExecASInsertTriggers(EState *estate, ResultRelInfo *relinfo)
+ExecASInsertTriggers(EState *estate, ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if (trigdesc && trigdesc->trig_insert_after_statement)
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_INSERT,
-							  false, NULL, NULL, NIL, NULL, NULL);
+							  false, NULL, NULL, NIL, NULL, transition_capture);
 }
 
 TupleTableSlot *
@@ -2470,13 +2504,14 @@ ExecBSDeleteTriggers(EState *estate, ResultRelInfo *relinfo)
 }
 
 void
-ExecASDeleteTriggers(EState *estate, ResultRelInfo *relinfo)
+ExecASDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if (trigdesc && trigdesc->trig_delete_after_statement)
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_DELETE,
-							  false, NULL, NULL, NIL, NULL, NULL);
+							  false, NULL, NULL, NIL, NULL, transition_capture);
 }
 
 bool
@@ -2684,7 +2719,8 @@ ExecBSUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
 }
 
 void
-ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
+ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
@@ -2692,7 +2728,7 @@ ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_UPDATE,
 							  false, NULL, NULL, NIL,
 							  GetUpdatedColumns(relinfo, estate),
-							  NULL);
+							  transition_capture);
 }
 
 TupleTableSlot *
@@ -3363,6 +3399,7 @@ typedef struct AfterTriggerSharedData
 	Oid			ats_tgoid;		/* the trigger's ID */
 	Oid			ats_relid;		/* the relation it's on */
 	CommandId	ats_firing_id;	/* ID for firing cycle */
+	TransitionCaptureState *ats_transition_capture;
 } AfterTriggerSharedData;
 
 typedef struct AfterTriggerEventData *AfterTriggerEvent;
@@ -3468,9 +3505,6 @@ typedef struct AfterTriggerEventList
  * fdw_tuplestores[query_depth] is a tuplestore containing the foreign tuples
  * needed for the current query.
  *
- * old_tuplestores[query_depth] and new_tuplestores[query_depth] hold the
- * transition relations for the current query.
- *
  * maxquerydepth is just the allocated length of query_stack and the
  * tuplestores.
  *
@@ -3503,8 +3537,6 @@ typedef struct AfterTriggersData
 	AfterTriggerEventList *query_stack; /* events pending from each query */
 	Tuplestorestate **fdw_tuplestores;	/* foreign tuples for one row from
 										 * each query */
-	Tuplestorestate **old_tuplestores;	/* all old tuples from each query */
-	Tuplestorestate **new_tuplestores;	/* all new tuples from each query */
 	int			maxquerydepth;	/* allocated len of above array */
 	MemoryContext event_cxt;	/* memory context for events, if any */
 
@@ -3525,7 +3557,8 @@ static void AfterTriggerExecute(AfterTriggerEvent event,
 					Instrumentation *instr,
 					MemoryContext per_tuple_context,
 					TupleTableSlot *trig_tuple_slot1,
-					TupleTableSlot *trig_tuple_slot2);
+					TupleTableSlot *trig_tuple_slot2,
+					TransitionCaptureState *transition_capture);
 static SetConstraintState SetConstraintStateCreate(int numalloc);
 static SetConstraintState SetConstraintStateCopy(SetConstraintState state);
 static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
@@ -3534,8 +3567,6 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
 
 /*
  * Gets a current query transition tuplestore and initializes it if necessary.
- * This can be holding a single transition row tuple (in the case of an FDW)
- * or a transition table (for an AFTER trigger).
  */
 static Tuplestorestate *
 GetTriggerTransitionTuplestore(Tuplestorestate **tss)
@@ -3715,6 +3746,7 @@ afterTriggerAddEvent(AfterTriggerEventList *events,
 		if (newshared->ats_tgoid == evtshared->ats_tgoid &&
 			newshared->ats_relid == evtshared->ats_relid &&
 			newshared->ats_event == evtshared->ats_event &&
+			newshared->ats_transition_capture == evtshared->ats_transition_capture &&
 			newshared->ats_firing_id == 0)
 			break;
 	}
@@ -3826,7 +3858,8 @@ AfterTriggerExecute(AfterTriggerEvent event,
 					FmgrInfo *finfo, Instrumentation *instr,
 					MemoryContext per_tuple_context,
 					TupleTableSlot *trig_tuple_slot1,
-					TupleTableSlot *trig_tuple_slot2)
+					TupleTableSlot *trig_tuple_slot2,
+					TransitionCaptureState *transition_capture)
 {
 	AfterTriggerShared evtshared = GetTriggerSharedData(event);
 	Oid			tgoid = evtshared->ats_tgoid;
@@ -3941,16 +3974,14 @@ AfterTriggerExecute(AfterTriggerEvent event,
 	/*
 	 * Set up the tuplestore information.
 	 */
-	if (LocTriggerData.tg_trigger->tgoldtable)
-		LocTriggerData.tg_oldtable =
-			GetTriggerTransitionTuplestore(afterTriggers.old_tuplestores);
-	else
-		LocTriggerData.tg_oldtable = NULL;
-	if (LocTriggerData.tg_trigger->tgnewtable)
-		LocTriggerData.tg_newtable =
-			GetTriggerTransitionTuplestore(afterTriggers.new_tuplestores);
-	else
-		LocTriggerData.tg_newtable = NULL;
+	LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL;
+	if (transition_capture != NULL)
+	{
+		if (LocTriggerData.tg_trigger->tgoldtable)
+			LocTriggerData.tg_oldtable = transition_capture->tcs_old_tuplestore;
+		if (LocTriggerData.tg_trigger->tgnewtable)
+			LocTriggerData.tg_newtable = transition_capture->tcs_new_tuplestore;
+	}
 
 	/*
 	 * Setup the remaining trigger information
@@ -4158,7 +4189,8 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 				 * won't try to re-fire it.
 				 */
 				AfterTriggerExecute(event, rel, trigdesc, finfo, instr,
-									per_tuple_context, slot1, slot2);
+									per_tuple_context, slot1, slot2,
+									evtshared->ats_transition_capture);
 
 				/*
 				 * Mark the event as done.
@@ -4232,8 +4264,6 @@ AfterTriggerBeginXact(void)
 	Assert(afterTriggers.state == NULL);
 	Assert(afterTriggers.query_stack == NULL);
 	Assert(afterTriggers.fdw_tuplestores == NULL);
-	Assert(afterTriggers.old_tuplestores == NULL);
-	Assert(afterTriggers.new_tuplestores == NULL);
 	Assert(afterTriggers.maxquerydepth == 0);
 	Assert(afterTriggers.event_cxt == NULL);
 	Assert(afterTriggers.events.head == NULL);
@@ -4278,8 +4308,6 @@ AfterTriggerEndQuery(EState *estate)
 {
 	AfterTriggerEventList *events;
 	Tuplestorestate *fdw_tuplestore;
-	Tuplestorestate *old_tuplestore;
-	Tuplestorestate *new_tuplestore;
 
 	/* Must be inside a query, too */
 	Assert(afterTriggers.query_depth >= 0);
@@ -4338,18 +4366,6 @@ AfterTriggerEndQuery(EState *estate)
 		tuplestore_end(fdw_tuplestore);
 		afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
 	}
-	old_tuplestore = afterTriggers.old_tuplestores[afterTriggers.query_depth];
-	if (old_tuplestore)
-	{
-		tuplestore_end(old_tuplestore);
-		afterTriggers.old_tuplestores[afterTriggers.query_depth] = NULL;
-	}
-	new_tuplestore = afterTriggers.new_tuplestores[afterTriggers.query_depth];
-	if (new_tuplestore)
-	{
-		tuplestore_end(new_tuplestore);
-		afterTriggers.new_tuplestores[afterTriggers.query_depth] = NULL;
-	}
 	afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);
 
 	afterTriggers.query_depth--;
@@ -4463,8 +4479,6 @@ AfterTriggerEndXact(bool isCommit)
 	 */
 	afterTriggers.query_stack = NULL;
 	afterTriggers.fdw_tuplestores = NULL;
-	afterTriggers.old_tuplestores = NULL;
-	afterTriggers.new_tuplestores = NULL;
 	afterTriggers.maxquerydepth = 0;
 	afterTriggers.state = NULL;
 
@@ -4597,18 +4611,6 @@ AfterTriggerEndSubXact(bool isCommit)
 					tuplestore_end(ts);
 					afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
 				}
-				ts = afterTriggers.old_tuplestores[afterTriggers.query_depth];
-				if (ts)
-				{
-					tuplestore_end(ts);
-					afterTriggers.old_tuplestores[afterTriggers.query_depth] = NULL;
-				}
-				ts = afterTriggers.new_tuplestores[afterTriggers.query_depth];
-				if (ts)
-				{
-					tuplestore_end(ts);
-					afterTriggers.new_tuplestores[afterTriggers.query_depth] = NULL;
-				}
 
 				afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);
 			}
@@ -4688,12 +4690,6 @@ AfterTriggerEnlargeQueryState(void)
 		afterTriggers.fdw_tuplestores = (Tuplestorestate **)
 			MemoryContextAllocZero(TopTransactionContext,
 								   new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.old_tuplestores = (Tuplestorestate **)
-			MemoryContextAllocZero(TopTransactionContext,
-								   new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.new_tuplestores = (Tuplestorestate **)
-			MemoryContextAllocZero(TopTransactionContext,
-								   new_alloc * sizeof(Tuplestorestate *));
 		afterTriggers.maxquerydepth = new_alloc;
 	}
 	else
@@ -4709,19 +4705,9 @@ AfterTriggerEnlargeQueryState(void)
 		afterTriggers.fdw_tuplestores = (Tuplestorestate **)
 			repalloc(afterTriggers.fdw_tuplestores,
 					 new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.old_tuplestores = (Tuplestorestate **)
-			repalloc(afterTriggers.old_tuplestores,
-					 new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.new_tuplestores = (Tuplestorestate **)
-			repalloc(afterTriggers.new_tuplestores,
-					 new_alloc * sizeof(Tuplestorestate *));
 		/* Clear newly-allocated slots for subsequent lazy initialization. */
 		memset(afterTriggers.fdw_tuplestores + old_alloc,
 			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
-		memset(afterTriggers.old_tuplestores + old_alloc,
-			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
-		memset(afterTriggers.new_tuplestores + old_alloc,
-			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
 		afterTriggers.maxquerydepth = new_alloc;
 	}
 
@@ -5255,42 +5241,39 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 		if ((event == TRIGGER_EVENT_DELETE && delete_old_table) ||
 			(event == TRIGGER_EVENT_UPDATE && update_old_table))
 		{
-			Tuplestorestate *old_tuplestore;
-
 			Assert(oldtup != NULL);
-			old_tuplestore =
-				GetTriggerTransitionTuplestore
-				(afterTriggers.old_tuplestores);
+
 			if (map != NULL)
 			{
 				HeapTuple	converted = do_convert_tuple(oldtup, map);
 
-				tuplestore_puttuple(old_tuplestore, converted);
+				tuplestore_puttuple(transition_capture->tcs_old_tuplestore,
+									converted);
 				pfree(converted);
 			}
 			else
-				tuplestore_puttuple(old_tuplestore, oldtup);
+				tuplestore_puttuple(transition_capture->tcs_old_tuplestore,
+									oldtup);
 		}
 		if ((event == TRIGGER_EVENT_INSERT && insert_new_table) ||
 			(event == TRIGGER_EVENT_UPDATE && update_new_table))
 		{
-			Tuplestorestate *new_tuplestore;
-
 			Assert(newtup != NULL);
-			new_tuplestore =
-				GetTriggerTransitionTuplestore
-				(afterTriggers.new_tuplestores);
+
 			if (original_insert_tuple != NULL)
-				tuplestore_puttuple(new_tuplestore, original_insert_tuple);
+				tuplestore_puttuple(transition_capture->tcs_new_tuplestore,
+									original_insert_tuple);
 			else if (map != NULL)
 			{
 				HeapTuple	converted = do_convert_tuple(newtup, map);
 
-				tuplestore_puttuple(new_tuplestore, converted);
+				tuplestore_puttuple(transition_capture->tcs_new_tuplestore,
+									converted);
 				pfree(converted);
 			}
 			else
-				tuplestore_puttuple(new_tuplestore, newtup);
+				tuplestore_puttuple(transition_capture->tcs_new_tuplestore,
+									newtup);
 		}
 
 		/* If transition tables are the only reason we're here, return. */
@@ -5465,6 +5448,7 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 		new_shared.ats_tgoid = trigger->tgoid;
 		new_shared.ats_relid = RelationGetRelid(rel);
 		new_shared.ats_firing_id = 0;
+		new_shared.ats_transition_capture = transition_capture;
 
 		afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth],
 							 &new_event, &new_shared);
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 6eb979f17da..2bacf66ac48 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1420,14 +1420,18 @@ fireASTriggers(ModifyTableState *node)
 		case CMD_INSERT:
 			if (node->mt_onconflict == ONCONFLICT_UPDATE)
 				ExecASUpdateTriggers(node->ps.state,
-									 resultRelInfo);
-			ExecASInsertTriggers(node->ps.state, resultRelInfo);
+									 resultRelInfo,
+									 node->mt_transition_capture);
+			ExecASInsertTriggers(node->ps.state, resultRelInfo,
+								 node->mt_transition_capture);
 			break;
 		case CMD_UPDATE:
-			ExecASUpdateTriggers(node->ps.state, resultRelInfo);
+			ExecASUpdateTriggers(node->ps.state, resultRelInfo,
+								 node->mt_transition_capture);
 			break;
 		case CMD_DELETE:
-			ExecASDeleteTriggers(node->ps.state, resultRelInfo);
+			ExecASDeleteTriggers(node->ps.state, resultRelInfo,
+								 node->mt_transition_capture);
 			break;
 		default:
 			elog(ERROR, "unknown operation");
@@ -2272,6 +2276,10 @@ ExecEndModifyTable(ModifyTableState *node)
 {
 	int			i;
 
+	/* Free transition tables */
+	if (node->mt_transition_capture != NULL)
+		DestroyTransitionCaptureState(node->mt_transition_capture);
+
 	/*
 	 * Allow any FDWs to shut down
 	 */
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 736f5de67dd..5ce57966245 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -42,8 +42,8 @@ typedef struct TriggerData
 } TriggerData;
 
 /*
- * Meta-data to control the capture of old and new tuples into transition
- * tables from child tables.
+ * The state for capturing old and new tuples into transition tables for a
+ * single ModifyTable node.
  */
 typedef struct TransitionCaptureState
 {
@@ -72,6 +72,10 @@ typedef struct TransitionCaptureState
 	 * the original tuple directly.
 	 */
 	HeapTuple	tcs_original_insert_tuple;
+
+	/* The tuplestores backing the transition tables. */
+	Tuplestorestate *tcs_old_tuplestore;
+	Tuplestorestate *tcs_new_tuplestore;
 } TransitionCaptureState;
 
 /*
@@ -162,13 +166,15 @@ extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc);
 
 extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc);
 extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc);
+extern void DestroyTransitionCaptureState(TransitionCaptureState *tcs);
 
 extern void FreeTriggerDesc(TriggerDesc *trigdesc);
 
 extern void ExecBSInsertTriggers(EState *estate,
 					 ResultRelInfo *relinfo);
 extern void ExecASInsertTriggers(EState *estate,
-					 ResultRelInfo *relinfo);
+					 ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture);
 extern TupleTableSlot *ExecBRInsertTriggers(EState *estate,
 					 ResultRelInfo *relinfo,
 					 TupleTableSlot *slot);
@@ -183,7 +189,8 @@ extern TupleTableSlot *ExecIRInsertTriggers(EState *estate,
 extern void ExecBSDeleteTriggers(EState *estate,
 					 ResultRelInfo *relinfo);
 extern void ExecASDeleteTriggers(EState *estate,
-					 ResultRelInfo *relinfo);
+					 ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture);
 extern bool ExecBRDeleteTriggers(EState *estate,
 					 EPQState *epqstate,
 					 ResultRelInfo *relinfo,
@@ -200,7 +207,8 @@ extern bool ExecIRDeleteTriggers(EState *estate,
 extern void ExecBSUpdateTriggers(EState *estate,
 					 ResultRelInfo *relinfo);
 extern void ExecASUpdateTriggers(EState *estate,
-					 ResultRelInfo *relinfo);
+					 ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture);
 extern TupleTableSlot *ExecBRUpdateTriggers(EState *estate,
 					 EPQState *epqstate,
 					 ResultRelInfo *relinfo,
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 2b5695f5792..9eab7c38f8f 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2140,5 +2140,29 @@ DETAIL:  ROW triggers with transition tables are not supported in inheritance hi
 drop trigger child_row_trig on child;
 alter table child inherit parent;
 drop table child, parent;
+--
+-- Verify behavior of queries with wCTEs, where multiple transition
+-- tuplestores can be active at the same time because there are
+-- multiple DML statements that might fire triggers with transition
+-- tables
+--
+create table table1 (a int);
+create table table2 (a text);
+create trigger table1_trig
+  after insert or update or delete on table1
+  referencing old table as old_table new table as new_table
+  for each statement
+  execute procedure dump_transition_tables();
+create trigger table2_trig
+  after insert or update or delete on table2
+  referencing old table as old_table new table as new_table
+  for each statement
+  execute procedure dump_transition_tables();
+with wcte as (insert into table1 values (42))
+  insert into table2 values ('hello world');
+NOTICE:  trigger = table2_trig, old table = <NULL>, new table = ("hello world")
+NOTICE:  trigger = table1_trig, old table = <NULL>, new table = (42)
+drop table table1;
+drop table table2;
 -- cleanup
 drop function dump_transition_tables();
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 14f25f85e15..14a1d4b0de1 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1648,5 +1648,30 @@ alter table child inherit parent;
 
 drop table child, parent;
 
+--
+-- Verify behavior of queries with wCTEs, where multiple transition
+-- tuplestores can be active at the same time because there are
+-- multiple DML statements that might fire triggers with transition
+-- tables
+--
+create table table1 (a int);
+create table table2 (a text);
+create trigger table1_trig
+  after insert or update or delete on table1
+  referencing old table as old_table new table as new_table
+  for each statement
+  execute procedure dump_transition_tables();
+create trigger table2_trig
+  after insert or update or delete on table2
+  referencing old table as old_table new table as new_table
+  for each statement
+  execute procedure dump_transition_tables();
+
+with wcte as (insert into table1 values (42))
+  insert into table2 values ('hello world');
+
+drop table table1;
+drop table table2;
+
 -- cleanup
 drop function dump_transition_tables();
#24Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Peter Geoghegan (#21)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Wed, Jun 7, 2017 at 12:19 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan <pg@bowt.ie> wrote:

Also, ISTM that the code within ENRMetadataGetTupDesc() probably
requires more explanation, resource management wise.

Also, it's not clear why it should be okay that the new type of
ephemeral RTEs introduced don't have permissions checks. There are
currently cases where the user cannot see data that they inserted
themselves (e.g., through RETURNING), on the theory that a before row
trigger might have modified the final contents of the tuple in a way
that the original inserter isn't supposed to know details about.

As the INSERT docs say, "Use of the RETURNING clause requires SELECT
privilege on all columns mentioned in RETURNING". Similarly, the
excluded.* pseudo-relation requires select privilege (on the
corresponding target relation columns) in order to be usable by ON
CONFLICT DO UPDATE.

This is an interesting question and one that was mentioned here (near
the bottom):

/messages/by-id/CACjxUsO=squN2XbYBM6qLfS9co=bfGqQFxMfY+pjyAGKP_jpwQ@mail.gmail.com

I'm pretty sure that whatever applies to the NEW and OLD variables
should also apply to the new table and old table, because the former
are conceptually range variables over the latter. Without having
checked either the standard or our behaviour for NEW and OLD, I'll bet
you one internet point that they say we should apply the same access
restrictions as the underlying table, and we don't. Am I wrong?

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Peter Geoghegan (#19)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Jun 5, 2017 at 6:40 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

After sleeping on it, I don't think we need to make that decision here
though. I think it's better to just move the tuplestores into
ModifyTableState so that each embedded DML statement has its own, and
have ModifyTable pass them to the trigger code explicitly.

I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
case -- one for updated tuples, and the other for inserted tuples.

Hmm. Right. INSERT ... ON CONFLICT DO UPDATE causes both AFTER
INSERT and AFTER UPDATE statement-level triggers to be fired, but then
both kinds see all the tuples -- those that were inserted and those
that were updated. That's not right.

For example:

postgres=# insert into my_table values ('ID1', 1), ('ID2', 1), ('ID3', 1)
postgres-# on conflict (a) do
postgres-# update set counter = my_table.counter + excluded.counter;
NOTICE: trigger = my_update_trigger, old table = (ID1,1), (ID2,1),
new table = (ID1,2), (ID2,2), (ID3,1)
NOTICE: trigger = my_insert_trigger, new table = (ID1,2), (ID2,2), (ID3,1)
INSERT 0 3

That's the result of the following:

create or replace function dump_insert() returns trigger language plpgsql as
$$
begin
raise notice 'trigger = %, new table = %',
TG_NAME,
(select string_agg(new_table::text, ', ' order by a) from new_table);
return null;
end;
$$;

create or replace function dump_update() returns trigger language plpgsql as
$$
begin
raise notice 'trigger = %, old table = %, new table = %',
TG_NAME,
(select string_agg(old_table::text, ', ' order by a) from old_table),
(select string_agg(new_table::text, ', ' order by a) from new_table);
return null;
end;
$$;

create table my_table (a text primary key, counter int);

insert into my_table values ('ID1', 1), ('ID2', 1);

create trigger my_insert_trigger
after insert on my_table
referencing new table as new_table
for each statement
execute procedure dump_insert();

create trigger my_update_trigger
after update on my_table
referencing old table as old_table new table as new_table
for each statement
execute procedure dump_update();

insert into my_table values ('ID1', 1), ('ID2', 1), ('ID3', 1)
on conflict (a) do
update set counter = my_table.counter + excluded.counter;

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thomas Munro (#25)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan <pg@bowt.ie> wrote:

I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
case -- one for updated tuples, and the other for inserted tuples.

Hmm. Right. INSERT ... ON CONFLICT DO UPDATE causes both AFTER
INSERT and AFTER UPDATE statement-level triggers to be fired, but then
both kinds see all the tuples -- those that were inserted and those
that were updated. That's not right.

Or maybe it is right. We're out of spec here, so we'd basically have
to decide (though MERGE's behaviour would be interesting to compare
with). What we have seems reasonable for an AFTER INSERT OR UPDATE
statement-level trigger, I think. It's just a bit questionable if you
asked for just one of those things. The question is whether the
trigger's 'when' clause is supposed to refer to the type of statement
you ran or the way individual rows turned out to be processed in our
out-of-standard ON CONFLICT case. If you take the former view then we
could simply decree that such a statement is both an INSERT and an
UPDATE for trigger selection purposes.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Kevin Grittner
kgrittn@gmail.com
In reply to: Thomas Munro (#26)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Wed, Jun 7, 2017 at 3:42 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan <pg@bowt.ie> wrote:

I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
case -- one for updated tuples, and the other for inserted tuples.

Hmm. Right. INSERT ... ON CONFLICT DO UPDATE causes both AFTER
INSERT and AFTER UPDATE statement-level triggers to be fired, but then
both kinds see all the tuples -- those that were inserted and those
that were updated. That's not right.

Or maybe it is right.

The idea of transition tables is that you see all changes to the
target of a single statement in the form of delta relations -- with
and "old" table for any rows affected by a delete or update and a
"new" table for any rows affected by an update or delete. If we
have a single statement that combines INSERT and UPDATE behaviors,
it might make sense to have an "old" table for updates and a single
"new" table for both.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Kevin Grittner
kgrittn@gmail.com
In reply to: Robert Haas (#18)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Tue, Jun 6, 2017 at 4:42 PM, Robert Haas <robertmhaas@gmail.com> wrote:

So, are you willing and able to put any effort into this, like say
reviewing the patch Thomas posted, and if so when and how much? If
you're just done and you aren't going to put any more work into
maintaining it (for whatever reasons), then I think you should say so
straight out. People shouldn't have to guess whether you're going to
maintain your patch or not.

It has become clear that the scope of problems being found now
exceed what I can be sure of being able to fix in time to make for a
stable release, in spite of the heroic efforts Thomas has been
putting in. I had hoped to get this into the first or second CF of
this release, same with the release before, and same with the
release before that. At least landing it in the final CF drew the
level of review and testing needed to polish it, but it's far from
ideal timing (or procedure). I can revert from v10 and deal with
all of this for the first CF of some future release, but if someone
feels they can deal with it in v10 I'll stand back and offer what
help I can.

You mentioned blame earlier. That seems pointless to me. I'm
looking to see what works and what doesn't. When I went to develop
SSI it was because my employer needed it, and they asked what it
would take to get that done; I said one year of my time without
being drawn off for other tasks to get it to a point where they
would be better off using it than not, and then two years half time
work to address community concerns and get it committed, and follow
up on problems. They gave me that and it worked, and worked well.
In this case, resources to carry it through were not reserved when I
started, and when I became full-up with other things, then problems
surfaced. That doesn't work. I don't want to start something big
again until I have resources set up and reserved, as a priority, to
see it through. It's a matter of what works for both me and the
community and what doesn't.

In the meantime I'll try to keep to small enough issues that the
resources required to support what I do is not beyond what I can
deliver.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Kevin Grittner (#27)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 9:19 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Wed, Jun 7, 2017 at 3:42 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Jun 7, 2017 at 7:27 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan <pg@bowt.ie> wrote:

I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
case -- one for updated tuples, and the other for inserted tuples.

Hmm. Right. INSERT ... ON CONFLICT DO UPDATE causes both AFTER
INSERT and AFTER UPDATE statement-level triggers to be fired, but then
both kinds see all the tuples -- those that were inserted and those
that were updated. That's not right.

Or maybe it is right.

The idea of transition tables is that you see all changes to the
target of a single statement in the form of delta relations -- with
and "old" table for any rows affected by a delete or update and a
"new" table for any rows affected by an update or delete. If we
have a single statement that combines INSERT and UPDATE behaviors,
it might make sense to have an "old" table for updates and a single
"new" table for both.

That would be cool because that's the behaviour we have.

Is there anything about that semantics that is incompatible with the
incremental matview use case? For example, are the "counting" and
"DRed" algorithms you've proposed (for non-recursive and recursive
views) driven entirely by deletions and insertions, so that updates
look like deletes followed by inserts anyway? If so, I think our
current treatment of ON CONFLICT DO UPDATE should be fine: you can't
tell whether the tuples in the new table result from insert or update
without also consulting the old table, but if the algorithm treats all
tuples in the old table as deletes (even though in this case they
result from updates) and all tuples in the new table as inserts (even
though in this case *some* of them result from updates) anyway then I
don't think it matters.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Kevin Grittner (#27)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Wed, Jun 7, 2017 at 2:19 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

The idea of transition tables is that you see all changes to the
target of a single statement in the form of delta relations -- with
and "old" table for any rows affected by a delete or update and a
"new" table for any rows affected by an update or delete. If we
have a single statement that combines INSERT and UPDATE behaviors,
it might make sense to have an "old" table for updates and a single
"new" table for both.

My assumption would be that since you have as many as two
statement-level triggers firing that could reference transition tables
when ON CONFLICT DO UPDATE is used (one AFTER UPDATE statement level
trigger, and another AFTER INSERT statement level trigger), there'd be
separation at that level. You'd see updated tuples with one, and
inserted within the other. While INSERT ON CONFLICT DO UPDATE is
essentially one statement, it behaves as two statements in certain
limited ways. IIRC MERGE as implemented in other systems has severe
restrictions on things like row level triggers work (i.e. they simply
don't work), and so they don't provide much in the way of guidance.

My assumption about how transition tables ought to behave here is
based on the simple fact that we already fire both AFTER
statement-level triggers, plus my sense of aesthetics, or bias. I
admit that I might be missing the point, but if I am it would be
useful to know how.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Kevin Grittner
kgrittn@gmail.com
In reply to: Thomas Munro (#29)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Wed, Jun 7, 2017 at 4:48 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

Is there anything about that semantics that is incompatible with the
incremental matview use case?

Nothing incompatible at all. If we had separate "new" tables for
UPDATE and DELETE we would logically need to do a "counting"-style
UNION of them just like we will want to do with the OLD (with a
count of -1) and NEW (with a count of 1) to get to a delta now. So
keeping them separate would be marginally more work for incremental
matview, but not a big deal.

For example, are the "counting" and
"DRed" algorithms you've proposed (for non-recursive and recursive
views) driven entirely by deletions and insertions, so that updates
look like deletes followed by inserts anyway?

Counting is best done from a "delta" relation which has old and new
combined with opposite counts. I'm sure that will be fine with
DRed, too.

If so, I think our
current treatment of ON CONFLICT DO UPDATE should be fine: you can't
tell whether the tuples in the new table result from insert or update
without also consulting the old table, but if the algorithm treats all
tuples in the old table as deletes (even though in this case they
result from updates) and all tuples in the new table as inserts (even
though in this case *some* of them result from updates) anyway then I
don't think it matters.

I don't think so, either.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Kevin Grittner
kgrittn@gmail.com
In reply to: Peter Geoghegan (#30)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Wed, Jun 7, 2017 at 5:00 PM, Peter Geoghegan <pg@bowt.ie> wrote:

My assumption about how transition tables ought to behave here is
based on the simple fact that we already fire both AFTER
statement-level triggers, plus my sense of aesthetics, or bias. I
admit that I might be missing the point, but if I am it would be
useful to know how.

Well, either will work. My inclination is that a single statement
should cause one execution of the FOR EACH STATEMENT trigger, but if
a good case can be made that we should have a FOR EACH STATEMENT
trigger fire for each clause within a statement -- well, it won't be
a problem for matview maintenance. The biggest hurt there would be
to *my* sense of aesthetics. ;-)

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Peter Geoghegan (#30)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Wed, Jun 7, 2017 at 3:00 PM, Peter Geoghegan <pg@bowt.ie> wrote:

My assumption would be that since you have as many as two
statement-level triggers firing that could reference transition tables
when ON CONFLICT DO UPDATE is used (one AFTER UPDATE statement level
trigger, and another AFTER INSERT statement level trigger), there'd be
separation at that level. You'd see updated tuples with one, and
inserted within the other. While INSERT ON CONFLICT DO UPDATE is
essentially one statement, it behaves as two statements in certain
limited ways.

BTW, as you probably already know INSERT ON CONFLICT DO UPDATE is
unable to affect the same row version a second time -- if we go on to
update a row that the same upsert statement itself actually originally
inserted, then a cardinality violation error is raised (recall that
upsert does not actually use MVCC semantics to detect conflicting
tuples). This differs from the UPDATE situation (I should really say
the UPDATE ... FROM situation), where the implementation has always
silently not updated the row a second time (to avoid the halloween
problem).

This cardinality violation restriction is a useful one for us to have
imposed, because there is now no question about the same row appearing
more than once. We know that a given row (any tuple from a single
update chain) cannot appear once for an INSERT statement level trigger
firing (appearing in the transition table there), while also making a
second appearance for the UPDATE statement level trigger.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Kevin Grittner (#32)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 10:21 AM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Wed, Jun 7, 2017 at 5:00 PM, Peter Geoghegan <pg@bowt.ie> wrote:

My assumption about how transition tables ought to behave here is
based on the simple fact that we already fire both AFTER
statement-level triggers, plus my sense of aesthetics, or bias. I
admit that I might be missing the point, but if I am it would be
useful to know how.

Well, either will work. My inclination is that a single statement
should cause one execution of the FOR EACH STATEMENT trigger, but if
a good case can be made that we should have a FOR EACH STATEMENT
trigger fire for each clause within a statement -- well, it won't be
a problem for matview maintenance. The biggest hurt there would be
to *my* sense of aesthetics. ;-)

Ok, I think we have two options here:

1. Keep the current behaviour. ON CONFLICT statements cause inserted
and updated tuples to be mixed together in the same 'new table'
tuplestore. Trigger authors will need to be aware that AFTER INSERT
triggers might see some rows that were actually updated, and vice
versa, which seems moderately surprising. If you defined one AFTER
INSERT trigger and one AFTER UPDATE trigger, you'll see the same rows
in both if someone runs such a statement (as I showed upthread), which
also seems to violate the PoLA.

2. Make a code change that would split the 'new table' tuplestore in
two: an insert tuplestore and an update tuplestore (for new images;
old images could remain in the old tuplestore that is also used for
deletes) as Peter suggests. That raises two questions for me: (1)
Does it really make sense for the 'when' clause, which sounds like it
only controls when we fire a trigger, also to affect which transition
tuples it sees? There is something slightly fishy about that. (2)
Assuming yes, should an AFTER INSERT OR UPDATE trigger see the union
of the the two tuplestores? Trigger authors would need to be aware a
single statement can produce a mixture of updates and inserts, but
only if they explicitly invite such problems into their lives with
that incantation.

I think the code change for 2 is not enormous. Split the tuplestore
in two, route tuples appropriately, and teach NamedTuplestoreScan to
scan both if a union is needed. It does sound like the less
user-hostile of the options.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thomas Munro (#34)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 11:05 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

1. Keep the current behaviour. [...]

2. Make a code change that would split the 'new table' tuplestore in
two: an insert tuplestore and an update tuplestore (for new images;
old images could remain in the old tuplestore that is also used for
deletes) as Peter suggests. That raises two questions for me: (1)
Does it really make sense for the 'when' clause, which sounds like it
only controls when we fire a trigger, also to affect which transition
tuples it sees? There is something slightly fishy about that. (2)
Assuming yes, should an AFTER INSERT OR UPDATE trigger see the union
of the the two tuplestores? Trigger authors would need to be aware a
single statement can produce a mixture of updates and inserts, but
only if they explicitly invite such problems into their lives with
that incantation.

A third option would be for an AFTER INSERT OR UPDATE trigger to be
invoked twice, once for the inserts and once again for the updates.
No union required, but also surprising.

Any other ideas?

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#28)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Wed, Jun 7, 2017 at 5:47 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

It has become clear that the scope of problems being found now
exceed what I can be sure of being able to fix in time to make for a
stable release, in spite of the heroic efforts Thomas has been
putting in. I had hoped to get this into the first or second CF of
this release, same with the release before, and same with the
release before that. At least landing it in the final CF drew the
level of review and testing needed to polish it, but it's far from
ideal timing (or procedure). I can revert from v10 and deal with
all of this for the first CF of some future release, but if someone
feels they can deal with it in v10 I'll stand back and offer what
help I can.

I really don't know what to make of this. I can't ever remember a
committer taking this attitude before. Aside from committing one
patch that Thomas submitted shortly after the original commit, you
haven't been willing to write, review, or commit a single line of
code. The problem doesn't seem to be that the amount of effort is any
more than it would be for any other feature this size; it's in better
shape than some. The problem appears to be, rather, that the scope of
problems you were willing to try to fix in a timely fashion was
basically zero, which isn't very realistic for a patch of this size no
matter who has reviewed it or in what level of detail.

We don't have a lot of formal documentation around the
responsibilities of PostgreSQL committers, but I think one that is
pretty clearly acknowledged is "you are responsible for what you
commit". If you're not willing to be responsible for your own
patches, turn in your commit bit. Then, when you submit a patch,
it'll either not get committed, or it'll be the responsibility of
whoever does.

I freely admit I encouraged you to commit this. I did not imagine
that would be followed immediately by abdicating all responsibility
for it. My mistake, I guess.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Robert Haas (#36)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On 06/08/2017 08:36 PM, Robert Haas wrote:

I freely admit I encouraged you to commit this. I did not imagine
that would be followed immediately by abdicating all responsibility
for it. My mistake, I guess.

Robert, chill out. Kevin offered to revert. It's perhaps not the best
way forward - I'm not familiar with the details here - but it's
certainly one way to take responsibility.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#37)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 1:48 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 06/08/2017 08:36 PM, Robert Haas wrote:

I freely admit I encouraged you to commit this. I did not imagine
that would be followed immediately by abdicating all responsibility
for it. My mistake, I guess.

Robert, chill out.

That's probably good advice, but ...

Kevin offered to revert. It's perhaps not the best way
forward - I'm not familiar with the details here - but it's certainly one
way to take responsibility.

... he also proposed to then commit it again for some future release
cycle, and what then? Revert it again if it turns out to have any
bugs, and commit it a third time in some release cycle after that?
It's a big, invasive patch. I don't think we want it going in and out
of the tree repeatedly. More generally, I don't think there's ever a
time when it's OK to commit a patch that you're not willing to put at
least some effort into fixing up afterwards.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Robert Haas (#38)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:

More generally, I don't think there's ever a
time when it's OK to commit a patch that you're not willing to put at
least some effort into fixing up afterwards.

Kevin said "It has become clear that the scope of problems being found
now exceed what I can be sure of being able to fix in time to make for
a stable release, in spite of the heroic efforts Thomas has been
putting in". I think it's clear that Kevin is willing to put in some
work. The issue is that he is unable to *guarantee* that he'll be able
to put in *sufficient* time, and in light of that concedes that it
might be best to revert and revisit for Postgres 11. He is being
cautious, and does not want to *risk* unduly holding up the release.

That was my understanding, at least.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Bruce Momjian
bruce@momjian.us
In reply to: Peter Geoghegan (#39)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 11:05:43AM -0700, Peter Geoghegan wrote:

On Thu, Jun 8, 2017 at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:

More generally, I don't think there's ever a
time when it's OK to commit a patch that you're not willing to put at
least some effort into fixing up afterwards.

Kevin said "It has become clear that the scope of problems being found
now exceed what I can be sure of being able to fix in time to make for
a stable release, in spite of the heroic efforts Thomas has been
putting in". I think it's clear that Kevin is willing to put in some
work. The issue is that he is unable to *guarantee* that he'll be able
to put in *sufficient* time, and in light of that concedes that it
might be best to revert and revisit for Postgres 11. He is being
cautious, and does not want to *risk* unduly holding up the release.

That was my understanding, at least.

I think we can all agree that Kevin should have communicated this
earlier, rather than requiring Robert to push him on the issue.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#39)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 2:05 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Jun 8, 2017 at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:

More generally, I don't think there's ever a
time when it's OK to commit a patch that you're not willing to put at
least some effort into fixing up afterwards.

Kevin said "It has become clear that the scope of problems being found
now exceed what I can be sure of being able to fix in time to make for
a stable release, in spite of the heroic efforts Thomas has been
putting in". I think it's clear that Kevin is willing to put in some
work. The issue is that he is unable to *guarantee* that he'll be able
to put in *sufficient* time, and in light of that concedes that it
might be best to revert and revisit for Postgres 11. He is being
cautious, and does not want to *risk* unduly holding up the release.

That was my understanding, at least.

I understood the same. However, he didn't review or commit any of the
bug fix patches that Thomas posted in May, or even respond to the
mailing list threads. I eventually reviewed and committed them to
avoid having the feature reverted; it took me only a few hours.
Later, he said he would review the TransitionCaptureState patch Thomas
posted at PGCon, later said again on-list that he would do so by
Friday or anyway Monday, and it's now Thursday. In other words, I am
not going just by this particular email, but by the fact that he
hasn't committed so much as a one line bug fix or posted any reviews
in a long time. The last non-reverted commit he made related to this
feature was on April 6th, two days after the initial commit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Robert Haas (#41)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 11:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I understood the same. However, he didn't review or commit any of the
bug fix patches that Thomas posted in May, or even respond to the
mailing list threads. I eventually reviewed and committed them to
avoid having the feature reverted; it took me only a few hours.
Later, he said he would review the TransitionCaptureState patch Thomas
posted at PGCon, later said again on-list that he would do so by
Friday or anyway Monday, and it's now Thursday. In other words, I am
not going just by this particular email, but by the fact that he
hasn't committed so much as a one line bug fix or posted any reviews
in a long time. The last non-reverted commit he made related to this
feature was on April 6th, two days after the initial commit.

It does seem like Kevin could have done better there. However, I think
that Kevin was taking responsibility for the situation, and that it
wasn't accurate to suggest he blamed others. I thought his account of
the constraints that he operated under was a simple factual account.

I also don't think it's useful for you to discourage revert on the
grounds that it's a slippery slope. Admitting fault doesn't need to be
made any harder.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#42)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 2:55 PM, Peter Geoghegan <pg@bowt.ie> wrote:

It does seem like Kevin could have done better there. However, I think
that Kevin was taking responsibility for the situation, and that it
wasn't accurate to suggest he blamed others. I thought his account of
the constraints that he operated under was a simple factual account.

I also don't think it's useful for you to discourage revert on the
grounds that it's a slippery slope. Admitting fault doesn't need to be
made any harder.

I am not sure that I entirely agree with you on these points, but I
don't want to debate it further and especially not on a public mailing
list. I'm going to spend some time over the next ~24 hours studying
this thread and the patches and determining whether or not I'm willing
to take responsibility for this patch. If I decide that I am, I will
then work on trying to fix the technical problems. If I decide that I
am not, then I think it will be necessary to take Kevin up on his
offer to revert, unless some other committer volunteers. (Of course,
anyone could step in to do the work, as Thomas already has to a
considerable degree, but without a committer involved it doesn't fix
the problem.)

I don't really like either option, because, on the one hand, this is a
pretty invasive thing to go rip out -- maybe we don't have to rip out
the entire patch series, but just some of the later patches? -- and on
the other hand, keeping it in the tree will require work, and I'm
busy. But there don't seem to be any other options.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Robert Haas (#43)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 1:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I am not sure that I entirely agree with you on these points, but I
don't want to debate it further and especially not on a public mailing
list.

Fair enough.

I'm going to spend some time over the next ~24 hours studying
this thread and the patches and determining whether or not I'm willing
to take responsibility for this patch.

Thank you.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Thomas Munro (#4)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

"Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:

Thomas> So, afterTriggers.query_stack is used to handle the reentrancy
Thomas> that results from triggers running further statements that
Thomas> might fire triggers. It isn't used for dealing with extra
Thomas> ModifyTable nodes that can appear in a plan because of wCTEs.
Thomas> Could it also be used for that purpose? I think that would
Thomas> only work if it is the case that each ModifyTable node begin
Thomas> and then runs to completion (ie no interleaving of wCTE
Thomas> execution) and then its AS trigger fires, which I'm not sure
Thomas> about.

There's a problem with this which I didn't see anyone mention (though
I've not read the whole history); existing users of wCTEs rely on the
fact that no AFTER triggers are run until _all_ modifications are
completed. If you change that, then you can't use wCTEs to insert into
tables with FK checks without knowing the order in which the individual
wCTEs are executed, which is currently a bit vague and non-obvious
(which doesn't cause issues at the moment only because nothing actually
depends on the order).

for example:
create table foo (id integer primary key);
create table foo_bar (foo_id integer references foo, bar_id integer);
with i1 as (insert into foo values (1))
insert into foo_bar values (1,2),(1,3);

This would fail the FK check if each insert did its own trigger firing,
since the foo_bar insert is actually run _first_.

Firing triggers earlier than they currently are would thus break
existing working queries.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Robert Haas (#43)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

"Robert" == Robert Haas <robertmhaas@gmail.com> writes:

Robert> unless some other committer volunteers. (Of course, anyone
Robert> could step in to do the work, as Thomas already has to a
Robert> considerable degree, but without a committer involved it
Robert> doesn't fix the problem.)

I can probably take this on if needed.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Andrew Gierth (#45)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Fri, Jun 9, 2017 at 8:41 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:

Thomas> So, afterTriggers.query_stack is used to handle the reentrancy
Thomas> that results from triggers running further statements that
Thomas> might fire triggers. It isn't used for dealing with extra
Thomas> ModifyTable nodes that can appear in a plan because of wCTEs.
Thomas> Could it also be used for that purpose? I think that would
Thomas> only work if it is the case that each ModifyTable node begin
Thomas> and then runs to completion (ie no interleaving of wCTE
Thomas> execution) and then its AS trigger fires, which I'm not sure
Thomas> about.

There's a problem with this which I didn't see anyone mention (though
I've not read the whole history); existing users of wCTEs rely on the
fact that no AFTER triggers are run until _all_ modifications are
completed. If you change that, then you can't use wCTEs to insert into
tables with FK checks without knowing the order in which the individual
wCTEs are executed, which is currently a bit vague and non-obvious
(which doesn't cause issues at the moment only because nothing actually
depends on the order).

for example:
create table foo (id integer primary key);
create table foo_bar (foo_id integer references foo, bar_id integer);
with i1 as (insert into foo values (1))
insert into foo_bar values (1,2),(1,3);

This would fail the FK check if each insert did its own trigger firing,
since the foo_bar insert is actually run _first_.

Firing triggers earlier than they currently are would thus break
existing working queries.

Thanks, and I agree. Later I came to the conclusion that we should
neither change nor depend on the order of execution, but in order to
avoid doing those things we simply couldn't keep using query_stack as
working space for the tuplestores -- the access pattern does not fit
stack semantics. That's why I proposed having the ModifyTable nodes
own their own transition tuplestores, and passing them explicitly to
the sites where they're needed (various trigger.c routines), instead
of accessing them via that global stack. That's the approach of the
WIP patch I posted here:

/messages/by-id/CAEepm=1K7F08QPu+kGcNQ-bCcYBa3QX=9AeE=j0doCmgqVs4Tg@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#43)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

Robert Haas <robertmhaas@gmail.com> writes:

I don't really like either option, because, on the one hand, this is a
pretty invasive thing to go rip out -- maybe we don't have to rip out
the entire patch series, but just some of the later patches? -- and on
the other hand, keeping it in the tree will require work, and I'm
busy. But there don't seem to be any other options.

I was wondering if we could simply remove the syntax (and documentation),
and leave the infrastructure in place until things are fixed. I think
everybody wants this feature, there's just concern about stabilizing it
in time for v10.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Gierth (#46)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 5:01 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Robert" == Robert Haas <robertmhaas@gmail.com> writes:

Robert> unless some other committer volunteers. (Of course, anyone
Robert> could step in to do the work, as Thomas already has to a
Robert> considerable degree, but without a committer involved it
Robert> doesn't fix the problem.)

I can probably take this on if needed.

I would certainly not complain if you are willing to do that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#21)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Tue, Jun 6, 2017 at 8:19 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan <pg@bowt.ie> wrote:

Also, ISTM that the code within ENRMetadataGetTupDesc() probably
requires more explanation, resource management wise.

Why so? I'm not saying you're wrong, but I don't see what's confusing
about the status quo. I may be missing something important.

Also, it's not clear why it should be okay that the new type of
ephemeral RTEs introduced don't have permissions checks. There are
currently cases where the user cannot see data that they inserted
themselves (e.g., through RETURNING), on the theory that a before row
trigger might have modified the final contents of the tuple in a way
that the original inserter isn't supposed to know details about.

Right, but the trigger has to be have been created by someone who has
TRIGGER permission on the table, and such an individual can trivially
steal the entire contents of every table row subsequently touched by
any DML statement by installing a FOR-EACH-ROW trigger that records
the contents of the tuple someplace. So the fact that they can now
also do that by installing a FOR-EACH-STATEMENT trigger doesn't seem
to be expanding the vulnerability surface. If anything, the new thing
gives trigger-creators less power than they have already, since the
tuplestore only lets you see what tuples were modified, whereas a
before-row trigger could block or alter the modifications.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thomas Munro (#35)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 11:50 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Jun 8, 2017 at 11:05 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

1. Keep the current behaviour. [...]

2. Make a code change that would split the 'new table' tuplestore in
two: an insert tuplestore and an update tuplestore (for new images;
old images could remain in the old tuplestore that is also used for
deletes) as Peter suggests. That raises two questions for me: (1)
Does it really make sense for the 'when' clause, which sounds like it
only controls when we fire a trigger, also to affect which transition
tuples it sees? There is something slightly fishy about that. (2)
Assuming yes, should an AFTER INSERT OR UPDATE trigger see the union
of the the two tuplestores? Trigger authors would need to be aware a
single statement can produce a mixture of updates and inserts, but
only if they explicitly invite such problems into their lives with
that incantation.

A third option would be for an AFTER INSERT OR UPDATE trigger to be
invoked twice, once for the inserts and once again for the updates.
No union required, but also surprising.

Any other ideas?

I discussed this off-list with Andrew Gierth and we came up with a
fourth way: Use separate insert and update tuplestores (as originally
suggested by Peter) and use the <trigger event> (INSERT, UPDATE) to
decide which one a trigger should see, as described in option 2 above,
but disallow INSERT OR UPDATE triggers with transition tables so that
we don't have to choose any of the surprising behaviours described
above. Triggers with multiple <trigger event>s are a PostgreSQL
extension, so by not allowing them with transition tables we don't
reduce our compliance. If you want to be invoked twice when you run
ON CONFLICT statements (like option 3 above) then you'll need to
create two triggers, one for INSERT and the other for UPDATE, and each
will see only the transition tuples resulting from inserts or updates
respectively.

The door is still open for us to allow INSERT OR UPDATE with
transition tables in future releases if someone can figure out what
that should do.

I will draft a patch for this.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thomas Munro (#23)
1 attachment(s)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Wed, Jun 7, 2017 at 5:36 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I spent a couple of hours drafting a proof-of-concept to see if my
hunch was right. It seems to work correctly so far and isn't huge
(but certainly needs more testing and work):

6 files changed, 156 insertions(+), 109 deletions(-)

[Adding Andrew Gierth]

Here is a new version of the patch to fix transition tables with
wCTEs, rebased on top of
transition-tuples-from-child-tables-v10.patch[1]/messages/by-id/CAEepm=1Ei_0yN+vKTHHsTYdajaY59LBMUunxmpfhBU-eQQzqxA@mail.gmail.com which must be applied
first.

This is patch 2 of a stack of 3 patches addressing currently known
problems with transition tables.

[1]: /messages/by-id/CAEepm=1Ei_0yN+vKTHHsTYdajaY59LBMUunxmpfhBU-eQQzqxA@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

transition-tuples-from-wctes-v2.patchapplication/octet-stream; name=transition-tuples-from-wctes-v2.patchDownload
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 1458c929347..2395d6af7d1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1416,6 +1416,12 @@ BeginCopy(ParseState *pstate,
 					 errmsg("table \"%s\" does not have OIDs",
 							RelationGetRelationName(cstate->rel))));
 
+		/*
+		 * If there are any triggers with transition tables on the named
+		 * relation, we need to be prepared to capture transition tuples.
+		 */
+		cstate->transition_capture = MakeTransitionCaptureState(rel->trigdesc);
+
 		/* Initialize state for CopyFrom tuple routing. */
 		if (is_from && rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		{
@@ -1440,14 +1446,6 @@ BeginCopy(ParseState *pstate,
 			cstate->partition_tuple_slot = partition_tuple_slot;
 
 			/*
-			 * If there are any triggers with transition tables on the named
-			 * relation, we need to be prepared to capture transition tuples
-			 * from child relations too.
-			 */
-			cstate->transition_capture =
-				MakeTransitionCaptureState(rel->trigdesc);
-
-			/*
 			 * If we are capturing transition tuples, they may need to be
 			 * converted from partition format back to partitioned table
 			 * format (this is only ever necessary if a BEFORE trigger
@@ -2803,7 +2801,7 @@ CopyFrom(CopyState cstate)
 		pq_endmsgread();
 
 	/* Execute AFTER STATEMENT insertion triggers */
-	ExecASInsertTriggers(estate, resultRelInfo);
+	ExecASInsertTriggers(estate, resultRelInfo, cstate->transition_capture);
 
 	/* Handle queued AFTER triggers */
 	AfterTriggerEndQuery(estate);
@@ -2931,7 +2929,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 			cstate->cur_lineno = firstBufferedLineNo + i;
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 bufferedTuples[i],
-								 NIL, NULL);
+								 NIL, cstate->transition_capture);
 		}
 	}
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index ac75ff8f8e4..f2231b0aed9 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2071,9 +2071,10 @@ FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc)
 /*
  * Make a TransitionCaptureState object from a given TriggerDesc.  The
  * resulting object holds the flags which control whether transition tuples
- * are collected when tables are modified.  This allows us to use the flags
- * from a parent table to control the collection of transition tuples from
- * child tables.
+ * are collected when tables are modified, and the tuplestores themselves.
+ * Note that we copy the flags from a parent table into this struct (rather
+ * than using each relation's TriggerDesc directly) so that we can use it to
+ * control the collection of transition tuples from child tables.
  *
  * If there are no triggers with transition tables configured for 'trigdesc',
  * then return NULL.
@@ -2091,17 +2092,49 @@ MakeTransitionCaptureState(TriggerDesc *trigdesc)
 		(trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table ||
 		 trigdesc->trig_update_new_table || trigdesc->trig_insert_new_table))
 	{
+		MemoryContext oldcxt;
+		ResourceOwner saveResourceOwner;
+
+		oldcxt = MemoryContextSwitchTo(TopTransactionContext);
+		saveResourceOwner = CurrentResourceOwner;
+
 		state = (TransitionCaptureState *)
 			palloc0(sizeof(TransitionCaptureState));
 		state->tcs_delete_old_table = trigdesc->trig_delete_old_table;
 		state->tcs_update_old_table = trigdesc->trig_update_old_table;
 		state->tcs_update_new_table = trigdesc->trig_update_new_table;
 		state->tcs_insert_new_table = trigdesc->trig_insert_new_table;
+		PG_TRY();
+		{
+			CurrentResourceOwner = TopTransactionResourceOwner;
+			if (trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table)
+				state->tcs_old_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+			if (trigdesc->trig_insert_new_table || trigdesc->trig_update_new_table)
+				state->tcs_new_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+		}
+		PG_CATCH();
+		{
+			CurrentResourceOwner = saveResourceOwner;
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+		CurrentResourceOwner = saveResourceOwner;
+		MemoryContextSwitchTo(oldcxt);
 	}
 
 	return state;
 }
 
+void
+DestroyTransitionCaptureState(TransitionCaptureState *tcs)
+{
+	if (tcs->tcs_new_tuplestore != NULL)
+		tuplestore_end(tcs->tcs_new_tuplestore);
+	if (tcs->tcs_old_tuplestore != NULL)
+		tuplestore_end(tcs->tcs_old_tuplestore);
+	pfree(tcs);
+}
+
 /*
  * Call a trigger function.
  *
@@ -2260,13 +2293,14 @@ ExecBSInsertTriggers(EState *estate, ResultRelInfo *relinfo)
 }
 
 void
-ExecASInsertTriggers(EState *estate, ResultRelInfo *relinfo)
+ExecASInsertTriggers(EState *estate, ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if (trigdesc && trigdesc->trig_insert_after_statement)
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_INSERT,
-							  false, NULL, NULL, NIL, NULL, NULL);
+							  false, NULL, NULL, NIL, NULL, transition_capture);
 }
 
 TupleTableSlot *
@@ -2343,7 +2377,6 @@ ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if ((trigdesc && trigdesc->trig_insert_after_row) ||
-		(trigdesc && !transition_capture && trigdesc->trig_insert_new_table) ||
 		(transition_capture && transition_capture->tcs_insert_new_table))
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_INSERT,
 							  true, NULL, trigtuple,
@@ -2470,13 +2503,14 @@ ExecBSDeleteTriggers(EState *estate, ResultRelInfo *relinfo)
 }
 
 void
-ExecASDeleteTriggers(EState *estate, ResultRelInfo *relinfo)
+ExecASDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if (trigdesc && trigdesc->trig_delete_after_statement)
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_DELETE,
-							  false, NULL, NULL, NIL, NULL, NULL);
+							  false, NULL, NULL, NIL, NULL, transition_capture);
 }
 
 bool
@@ -2557,7 +2591,6 @@ ExecARDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if ((trigdesc && trigdesc->trig_delete_after_row) ||
-		(trigdesc && !transition_capture && trigdesc->trig_delete_old_table) ||
 		(transition_capture && transition_capture->tcs_delete_old_table))
 	{
 		HeapTuple	trigtuple;
@@ -2684,7 +2717,8 @@ ExecBSUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
 }
 
 void
-ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
+ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
@@ -2692,7 +2726,7 @@ ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_UPDATE,
 							  false, NULL, NULL, NIL,
 							  GetUpdatedColumns(relinfo, estate),
-							  NULL);
+							  transition_capture);
 }
 
 TupleTableSlot *
@@ -2823,9 +2857,6 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if ((trigdesc && trigdesc->trig_update_after_row) ||
-		(trigdesc && !transition_capture &&
-		 (trigdesc->trig_update_old_table ||
-		  trigdesc->trig_update_new_table)) ||
 		(transition_capture &&
 		 (transition_capture->tcs_update_old_table ||
 		  transition_capture->tcs_update_new_table)))
@@ -3363,6 +3394,7 @@ typedef struct AfterTriggerSharedData
 	Oid			ats_tgoid;		/* the trigger's ID */
 	Oid			ats_relid;		/* the relation it's on */
 	CommandId	ats_firing_id;	/* ID for firing cycle */
+	TransitionCaptureState *ats_transition_capture;
 } AfterTriggerSharedData;
 
 typedef struct AfterTriggerEventData *AfterTriggerEvent;
@@ -3468,9 +3500,6 @@ typedef struct AfterTriggerEventList
  * fdw_tuplestores[query_depth] is a tuplestore containing the foreign tuples
  * needed for the current query.
  *
- * old_tuplestores[query_depth] and new_tuplestores[query_depth] hold the
- * transition relations for the current query.
- *
  * maxquerydepth is just the allocated length of query_stack and the
  * tuplestores.
  *
@@ -3503,8 +3532,6 @@ typedef struct AfterTriggersData
 	AfterTriggerEventList *query_stack; /* events pending from each query */
 	Tuplestorestate **fdw_tuplestores;	/* foreign tuples for one row from
 										 * each query */
-	Tuplestorestate **old_tuplestores;	/* all old tuples from each query */
-	Tuplestorestate **new_tuplestores;	/* all new tuples from each query */
 	int			maxquerydepth;	/* allocated len of above array */
 	MemoryContext event_cxt;	/* memory context for events, if any */
 
@@ -3525,7 +3552,8 @@ static void AfterTriggerExecute(AfterTriggerEvent event,
 					Instrumentation *instr,
 					MemoryContext per_tuple_context,
 					TupleTableSlot *trig_tuple_slot1,
-					TupleTableSlot *trig_tuple_slot2);
+					TupleTableSlot *trig_tuple_slot2,
+					TransitionCaptureState *transition_capture);
 static SetConstraintState SetConstraintStateCreate(int numalloc);
 static SetConstraintState SetConstraintStateCopy(SetConstraintState state);
 static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
@@ -3534,8 +3562,6 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
 
 /*
  * Gets a current query transition tuplestore and initializes it if necessary.
- * This can be holding a single transition row tuple (in the case of an FDW)
- * or a transition table (for an AFTER trigger).
  */
 static Tuplestorestate *
 GetTriggerTransitionTuplestore(Tuplestorestate **tss)
@@ -3715,6 +3741,7 @@ afterTriggerAddEvent(AfterTriggerEventList *events,
 		if (newshared->ats_tgoid == evtshared->ats_tgoid &&
 			newshared->ats_relid == evtshared->ats_relid &&
 			newshared->ats_event == evtshared->ats_event &&
+			newshared->ats_transition_capture == evtshared->ats_transition_capture &&
 			newshared->ats_firing_id == 0)
 			break;
 	}
@@ -3826,7 +3853,8 @@ AfterTriggerExecute(AfterTriggerEvent event,
 					FmgrInfo *finfo, Instrumentation *instr,
 					MemoryContext per_tuple_context,
 					TupleTableSlot *trig_tuple_slot1,
-					TupleTableSlot *trig_tuple_slot2)
+					TupleTableSlot *trig_tuple_slot2,
+					TransitionCaptureState *transition_capture)
 {
 	AfterTriggerShared evtshared = GetTriggerSharedData(event);
 	Oid			tgoid = evtshared->ats_tgoid;
@@ -3941,16 +3969,14 @@ AfterTriggerExecute(AfterTriggerEvent event,
 	/*
 	 * Set up the tuplestore information.
 	 */
-	if (LocTriggerData.tg_trigger->tgoldtable)
-		LocTriggerData.tg_oldtable =
-			GetTriggerTransitionTuplestore(afterTriggers.old_tuplestores);
-	else
-		LocTriggerData.tg_oldtable = NULL;
-	if (LocTriggerData.tg_trigger->tgnewtable)
-		LocTriggerData.tg_newtable =
-			GetTriggerTransitionTuplestore(afterTriggers.new_tuplestores);
-	else
-		LocTriggerData.tg_newtable = NULL;
+	LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL;
+	if (transition_capture != NULL)
+	{
+		if (LocTriggerData.tg_trigger->tgoldtable)
+			LocTriggerData.tg_oldtable = transition_capture->tcs_old_tuplestore;
+		if (LocTriggerData.tg_trigger->tgnewtable)
+			LocTriggerData.tg_newtable = transition_capture->tcs_new_tuplestore;
+	}
 
 	/*
 	 * Setup the remaining trigger information
@@ -4158,7 +4184,8 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 				 * won't try to re-fire it.
 				 */
 				AfterTriggerExecute(event, rel, trigdesc, finfo, instr,
-									per_tuple_context, slot1, slot2);
+									per_tuple_context, slot1, slot2,
+									evtshared->ats_transition_capture);
 
 				/*
 				 * Mark the event as done.
@@ -4232,8 +4259,6 @@ AfterTriggerBeginXact(void)
 	Assert(afterTriggers.state == NULL);
 	Assert(afterTriggers.query_stack == NULL);
 	Assert(afterTriggers.fdw_tuplestores == NULL);
-	Assert(afterTriggers.old_tuplestores == NULL);
-	Assert(afterTriggers.new_tuplestores == NULL);
 	Assert(afterTriggers.maxquerydepth == 0);
 	Assert(afterTriggers.event_cxt == NULL);
 	Assert(afterTriggers.events.head == NULL);
@@ -4278,8 +4303,6 @@ AfterTriggerEndQuery(EState *estate)
 {
 	AfterTriggerEventList *events;
 	Tuplestorestate *fdw_tuplestore;
-	Tuplestorestate *old_tuplestore;
-	Tuplestorestate *new_tuplestore;
 
 	/* Must be inside a query, too */
 	Assert(afterTriggers.query_depth >= 0);
@@ -4338,18 +4361,6 @@ AfterTriggerEndQuery(EState *estate)
 		tuplestore_end(fdw_tuplestore);
 		afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
 	}
-	old_tuplestore = afterTriggers.old_tuplestores[afterTriggers.query_depth];
-	if (old_tuplestore)
-	{
-		tuplestore_end(old_tuplestore);
-		afterTriggers.old_tuplestores[afterTriggers.query_depth] = NULL;
-	}
-	new_tuplestore = afterTriggers.new_tuplestores[afterTriggers.query_depth];
-	if (new_tuplestore)
-	{
-		tuplestore_end(new_tuplestore);
-		afterTriggers.new_tuplestores[afterTriggers.query_depth] = NULL;
-	}
 	afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);
 
 	afterTriggers.query_depth--;
@@ -4463,8 +4474,6 @@ AfterTriggerEndXact(bool isCommit)
 	 */
 	afterTriggers.query_stack = NULL;
 	afterTriggers.fdw_tuplestores = NULL;
-	afterTriggers.old_tuplestores = NULL;
-	afterTriggers.new_tuplestores = NULL;
 	afterTriggers.maxquerydepth = 0;
 	afterTriggers.state = NULL;
 
@@ -4597,18 +4606,6 @@ AfterTriggerEndSubXact(bool isCommit)
 					tuplestore_end(ts);
 					afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
 				}
-				ts = afterTriggers.old_tuplestores[afterTriggers.query_depth];
-				if (ts)
-				{
-					tuplestore_end(ts);
-					afterTriggers.old_tuplestores[afterTriggers.query_depth] = NULL;
-				}
-				ts = afterTriggers.new_tuplestores[afterTriggers.query_depth];
-				if (ts)
-				{
-					tuplestore_end(ts);
-					afterTriggers.new_tuplestores[afterTriggers.query_depth] = NULL;
-				}
 
 				afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);
 			}
@@ -4688,12 +4685,6 @@ AfterTriggerEnlargeQueryState(void)
 		afterTriggers.fdw_tuplestores = (Tuplestorestate **)
 			MemoryContextAllocZero(TopTransactionContext,
 								   new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.old_tuplestores = (Tuplestorestate **)
-			MemoryContextAllocZero(TopTransactionContext,
-								   new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.new_tuplestores = (Tuplestorestate **)
-			MemoryContextAllocZero(TopTransactionContext,
-								   new_alloc * sizeof(Tuplestorestate *));
 		afterTriggers.maxquerydepth = new_alloc;
 	}
 	else
@@ -4709,19 +4700,9 @@ AfterTriggerEnlargeQueryState(void)
 		afterTriggers.fdw_tuplestores = (Tuplestorestate **)
 			repalloc(afterTriggers.fdw_tuplestores,
 					 new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.old_tuplestores = (Tuplestorestate **)
-			repalloc(afterTriggers.old_tuplestores,
-					 new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.new_tuplestores = (Tuplestorestate **)
-			repalloc(afterTriggers.new_tuplestores,
-					 new_alloc * sizeof(Tuplestorestate *));
 		/* Clear newly-allocated slots for subsequent lazy initialization. */
 		memset(afterTriggers.fdw_tuplestores + old_alloc,
 			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
-		memset(afterTriggers.old_tuplestores + old_alloc,
-			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
-		memset(afterTriggers.new_tuplestores + old_alloc,
-			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
 		afterTriggers.maxquerydepth = new_alloc;
 	}
 
@@ -5237,20 +5218,6 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 			original_insert_tuple =
 				transition_capture->tcs_original_insert_tuple;
 		}
-		else if (trigdesc != NULL)
-		{
-			/*
-			 * Check if we need to capture transition tuples for triggers
-			 * defined on this relation directly.  This case is useful for
-			 * cases like execReplication.c which don't set up a
-			 * TriggerCaptureState because they don't know how to work with
-			 * partitions.
-			 */
-			delete_old_table = trigdesc->trig_delete_old_table;
-			update_old_table = trigdesc->trig_update_old_table;
-			update_new_table = trigdesc->trig_update_new_table;
-			insert_new_table = trigdesc->trig_insert_new_table;
-		}
 
 		if ((event == TRIGGER_EVENT_DELETE && delete_old_table) ||
 			(event == TRIGGER_EVENT_UPDATE && update_old_table))
@@ -5258,9 +5225,8 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 			Tuplestorestate *old_tuplestore;
 
 			Assert(oldtup != NULL);
-			old_tuplestore =
-				GetTriggerTransitionTuplestore
-				(afterTriggers.old_tuplestores);
+			old_tuplestore = transition_capture->tcs_old_tuplestore;
+
 			if (map != NULL)
 			{
 				HeapTuple	converted = do_convert_tuple(oldtup, map);
@@ -5277,9 +5243,8 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 			Tuplestorestate *new_tuplestore;
 
 			Assert(newtup != NULL);
-			new_tuplestore =
-				GetTriggerTransitionTuplestore
-				(afterTriggers.new_tuplestores);
+			new_tuplestore = transition_capture->tcs_new_tuplestore;
+
 			if (original_insert_tuple != NULL)
 				tuplestore_puttuple(new_tuplestore, original_insert_tuple);
 			else if (map != NULL)
@@ -5465,6 +5430,7 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 		new_shared.ats_tgoid = trigger->tgoid;
 		new_shared.ats_relid = RelationGetRelid(rel);
 		new_shared.ats_firing_id = 0;
+		new_shared.ats_transition_capture = transition_capture;
 
 		afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth],
 							 &new_event, &new_shared);
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 6c86abca75d..94bfa99930e 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1437,14 +1437,18 @@ fireASTriggers(ModifyTableState *node)
 		case CMD_INSERT:
 			if (node->mt_onconflict == ONCONFLICT_UPDATE)
 				ExecASUpdateTriggers(node->ps.state,
-									 resultRelInfo);
-			ExecASInsertTriggers(node->ps.state, resultRelInfo);
+									 resultRelInfo,
+									 node->mt_transition_capture);
+			ExecASInsertTriggers(node->ps.state, resultRelInfo,
+								 node->mt_transition_capture);
 			break;
 		case CMD_UPDATE:
-			ExecASUpdateTriggers(node->ps.state, resultRelInfo);
+			ExecASUpdateTriggers(node->ps.state, resultRelInfo,
+								 node->mt_transition_capture);
 			break;
 		case CMD_DELETE:
-			ExecASDeleteTriggers(node->ps.state, resultRelInfo);
+			ExecASDeleteTriggers(node->ps.state, resultRelInfo,
+								 node->mt_transition_capture);
 			break;
 		default:
 			elog(ERROR, "unknown operation");
@@ -2289,6 +2293,10 @@ ExecEndModifyTable(ModifyTableState *node)
 {
 	int			i;
 
+	/* Free transition tables */
+	if (node->mt_transition_capture != NULL)
+		DestroyTransitionCaptureState(node->mt_transition_capture);
+
 	/*
 	 * Allow any FDWs to shut down
 	 */
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 736f5de67dd..5ce57966245 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -42,8 +42,8 @@ typedef struct TriggerData
 } TriggerData;
 
 /*
- * Meta-data to control the capture of old and new tuples into transition
- * tables from child tables.
+ * The state for capturing old and new tuples into transition tables for a
+ * single ModifyTable node.
  */
 typedef struct TransitionCaptureState
 {
@@ -72,6 +72,10 @@ typedef struct TransitionCaptureState
 	 * the original tuple directly.
 	 */
 	HeapTuple	tcs_original_insert_tuple;
+
+	/* The tuplestores backing the transition tables. */
+	Tuplestorestate *tcs_old_tuplestore;
+	Tuplestorestate *tcs_new_tuplestore;
 } TransitionCaptureState;
 
 /*
@@ -162,13 +166,15 @@ extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc);
 
 extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc);
 extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc);
+extern void DestroyTransitionCaptureState(TransitionCaptureState *tcs);
 
 extern void FreeTriggerDesc(TriggerDesc *trigdesc);
 
 extern void ExecBSInsertTriggers(EState *estate,
 					 ResultRelInfo *relinfo);
 extern void ExecASInsertTriggers(EState *estate,
-					 ResultRelInfo *relinfo);
+					 ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture);
 extern TupleTableSlot *ExecBRInsertTriggers(EState *estate,
 					 ResultRelInfo *relinfo,
 					 TupleTableSlot *slot);
@@ -183,7 +189,8 @@ extern TupleTableSlot *ExecIRInsertTriggers(EState *estate,
 extern void ExecBSDeleteTriggers(EState *estate,
 					 ResultRelInfo *relinfo);
 extern void ExecASDeleteTriggers(EState *estate,
-					 ResultRelInfo *relinfo);
+					 ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture);
 extern bool ExecBRDeleteTriggers(EState *estate,
 					 EPQState *epqstate,
 					 ResultRelInfo *relinfo,
@@ -200,7 +207,8 @@ extern bool ExecIRDeleteTriggers(EState *estate,
 extern void ExecBSUpdateTriggers(EState *estate,
 					 ResultRelInfo *relinfo);
 extern void ExecASUpdateTriggers(EState *estate,
-					 ResultRelInfo *relinfo);
+					 ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture);
 extern TupleTableSlot *ExecBRUpdateTriggers(EState *estate,
 					 EPQState *epqstate,
 					 ResultRelInfo *relinfo,
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 995410f1aae..0261d66e5c5 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2194,6 +2194,26 @@ DETAIL:  ROW triggers with transition tables are not supported in inheritance hi
 drop trigger child_row_trig on child;
 alter table child inherit parent;
 drop table child, parent;
+--
+-- Verify behavior of queries with wCTEs, where multiple transition
+-- tuplestores can be active at the same time because there are
+-- multiple DML statements that might fire triggers with transition
+-- tables
+--
+create table table1 (a int);
+create table table2 (a text);
+create trigger table1_trig
+  after insert on table1 referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger table2_trig
+  after insert on table2 referencing new table as new_table
+  for each statement execute procedure dump_insert();
+with wcte as (insert into table1 values (42))
+  insert into table2 values ('hello world');
+NOTICE:  trigger = table2_trig, new table = ("hello world")
+NOTICE:  trigger = table1_trig, new table = (42)
+drop table table1;
+drop table table2;
 -- cleanup
 drop function dump_insert();
 drop function dump_update();
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 683a5f1e5c4..128126a0f1a 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1704,6 +1704,27 @@ alter table child inherit parent;
 
 drop table child, parent;
 
+--
+-- Verify behavior of queries with wCTEs, where multiple transition
+-- tuplestores can be active at the same time because there are
+-- multiple DML statements that might fire triggers with transition
+-- tables
+--
+create table table1 (a int);
+create table table2 (a text);
+create trigger table1_trig
+  after insert on table1 referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger table2_trig
+  after insert on table2 referencing new table as new_table
+  for each statement execute procedure dump_insert();
+
+with wcte as (insert into table1 values (42))
+  insert into table2 values ('hello world');
+
+drop table table1;
+drop table table2;
+
 -- cleanup
 drop function dump_insert();
 drop function dump_update();
#53Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#52)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Fri, Jun 9, 2017 at 12:00 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Jun 7, 2017 at 5:36 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

I spent a couple of hours drafting a proof-of-concept to see if my
hunch was right. It seems to work correctly so far and isn't huge
(but certainly needs more testing and work):

6 files changed, 156 insertions(+), 109 deletions(-)

[Adding Andrew Gierth]

Here is a new version of the patch to fix transition tables with
wCTEs, rebased on top of
transition-tuples-from-child-tables-v10.patch[1] which must be applied
first.

This is patch 2 of a stack of 3 patches addressing currently known
problems with transition tables.

I don't see a reason why MakeTransitionCaptureState needs to force the
tuplestores into TopTransactionContext or make them owned by
TopTransactionResourceOwner. I mean, it was like that before, but
afterTriggers is a global variable and, potentially, there could still
be a pointer accessible through it to a tuplestore linked from it even
after the corresponding subtransaction aborted, possibly causing some
cleanup code to trip and fall over. But that can't be used to justify
this case, because the TransitionCaptureState is only reached through
the PlanState tree; if that goes away, how is anybody going to
accidentally follow a pointer to the now-absent tuplestore?

The structure of AfterTriggerSaveEvent with this patch applied looks
pretty weird. IIUC, whenever we enter the block guarded by if
(row_trigger), then either (a) transition_capture != NULL or (b) each
of the four "if" statements inside that block will separately decide
to do nothing. I think now that you're requiring a transition_capture
for all instances where transition tuplestores are used, you could
simplify this code. Maybe just change the outer "if" statement to if
(row_trigger) and then Assert(transition_capture != NULL)?

Not this patch's problem directly, but while scrutinizing this it
crossed my mind that we would need to prohibit constraint triggers
with transition tables. It turns out that we do, in the parser:

create constraint trigger table2_trig
after insert on table2 referencing new table as new_table
for each statement execute procedure dump_insert();
ERROR: syntax error at or near "referencing"

Possibly it would be better to allow the syntax and error out in
CreateTrigger(), so that we can give a better error message. It's
certainly not obvious from the syntax summary produced by \h CREATE
TRIGGER why this doesn't work. This might be more future-proof, too,
if somebody someday adds support for REFERENCING { OLD | NEW } ROW to
constraint triggers and fails to realize that there's not a check
anywhere outside the parser for the table case.

I don't see anything else wrong with this, and it seems like it solves
the reported problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Robert Haas (#50)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Thu, Jun 8, 2017 at 3:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jun 6, 2017 at 8:19 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan <pg@bowt.ie> wrote:

Also, ISTM that the code within ENRMetadataGetTupDesc() probably
requires more explanation, resource management wise.

Why so? I'm not saying you're wrong, but I don't see what's confusing
about the status quo. I may be missing something important.

Perhaps nothing. I just thought it looked a bit odd to rely on the
target relation's TupleDesc like that. It's not quite the same as the
foreign table tuplestore stuff, because that has its own relkind, and
has a relation RTE in the parser.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#53)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

Robert Haas wrote:

Not this patch's problem directly, but while scrutinizing this it
crossed my mind that we would need to prohibit constraint triggers
with transition tables. It turns out that we do, in the parser:

create constraint trigger table2_trig
after insert on table2 referencing new table as new_table
for each statement execute procedure dump_insert();
ERROR: syntax error at or near "referencing"

Possibly it would be better to allow the syntax and error out in
CreateTrigger(), so that we can give a better error message. It's
certainly not obvious from the syntax summary produced by \h CREATE
TRIGGER why this doesn't work.

Yeah, I agree. This doesn't look like actual protection, but just a
"happy accident", and the syntax error message is not helpful either.
We could keep it very simple by just throwing the error right there in
gram.y's production, adding TriggerReferencing in the right place in the
CREATE CONSTRAINT TRIGGER production -- at this stage there doesn't seem
to be a need to expand this any further.

This might be more future-proof, too,
if somebody someday adds support for REFERENCING { OLD | NEW } ROW to
constraint triggers and fails to realize that there's not a check
anywhere outside the parser for the table case.

I suppose in the future it would make sense to allow for-each-statement
constraint triggers with transition tables. It'd probably a useful way
to optimize FK checks for bulk operations.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Robert Haas (#53)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

"Robert" == Robert Haas <robertmhaas@gmail.com> writes:

Robert> I don't see a reason why MakeTransitionCaptureState needs to
Robert> force the tuplestores into TopTransactionContext or make them
Robert> owned by TopTransactionResourceOwner.

Nor do I, and I'm pretty sure it's leaking memory wholesale within a
transaction if you have aborted subxacts.

e.g. a few iterations of this, on a table with an appropriate trigger:

savepoint s1;
insert into foo
select i, case when i < 100000 then repeat('a',100)
else repeat('a',1/(i-100000)) end
from generate_series(1,100000) i;
rollback to s1;

This is a bit contrived of course but it shows that there's missing
cleanup somewhere, either in the form of poor choice of context or
missing code in the subxact cleanup.

Robert> I mean, it was like that before, but afterTriggers is a global
Robert> variable and, potentially, there could still be a pointer
Robert> accessible through it to a tuplestore linked from it even after
Robert> the corresponding subtransaction aborted, possibly causing some
Robert> cleanup code to trip and fall over. But that can't be used to
Robert> justify this case, because the TransitionCaptureState is only
Robert> reached through the PlanState tree; if that goes away, how is
Robert> anybody going to accidentally follow a pointer to the
Robert> now-absent tuplestore?

For per-row triggers with transition tables, a pointer to the transition
capture state ends up in the shared-data record in the event queue?

--
Andrew.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Andrew Gierth (#56)
1 attachment(s)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Mon, Jun 12, 2017 at 9:29 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Robert" == Robert Haas <robertmhaas@gmail.com> writes:

Robert> I don't see a reason why MakeTransitionCaptureState needs to
Robert> force the tuplestores into TopTransactionContext or make them
Robert> owned by TopTransactionResourceOwner.

Nor do I, and I'm pretty sure it's leaking memory wholesale within a
transaction if you have aborted subxacts.

e.g. a few iterations of this, on a table with an appropriate trigger:

savepoint s1;
insert into foo
select i, case when i < 100000 then repeat('a',100)
else repeat('a',1/(i-100000)) end
from generate_series(1,100000) i;
rollback to s1;

This is a bit contrived of course but it shows that there's missing
cleanup somewhere, either in the form of poor choice of context or
missing code in the subxact cleanup.

Right, thanks. I think the fix for this is to use
CurTransactionContext (for memory cleanup) and
CurTransactionResourceOwner (for temporary file cleanup, in case the
tuplestores spill to disk). The attached does it that way.

With the previous version of the patch, I could see temporary files
accumulating under base/pgsql_tmp from each aborted subtransaction
when I set work_mem = '64kB' to provoke spillage. With the attached
version, the happy path cleans up (because ExecEndModifyTable calls
DestroyTransitionCaptureState), and the error path does too (because
the subxact's memory and temporary files are automatically cleaned
up).

Robert> I mean, it was like that before, but afterTriggers is a global
Robert> variable and, potentially, there could still be a pointer
Robert> accessible through it to a tuplestore linked from it even after
Robert> the corresponding subtransaction aborted, possibly causing some
Robert> cleanup code to trip and fall over. But that can't be used to
Robert> justify this case, because the TransitionCaptureState is only
Robert> reached through the PlanState tree; if that goes away, how is
Robert> anybody going to accidentally follow a pointer to the
Robert> now-absent tuplestore?

For per-row triggers with transition tables, a pointer to the transition
capture state ends up in the shared-data record in the event queue?

Yes. But the only way for event queue data to last longer than the
execution of the current statement is if it's a deferred trigger.
Only constraint triggers can be deferred, and constraint triggers are
not allowed to have transition tables. If we wanted to support that
(for example, to handle set-orient FK checks as has been discussed),
we'd need to change the lifetime of the TransitionCaptureState object
(it would need to outlast ModifyTableState). I've added a note to
that effect to MakeTransitionCaptureState.

So far there seem to be no objections to the inclusion of the pointer
in the event queue that records which transition tables each trigger
invocation should see...? The extra memory consumption per chunk is
limited by the number of ModifyTable nodes, so it's not going to be a
memory hog, but I thought someone might not like it on other grounds.

On Sat, Jun 10, 2017 at 7:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:

The structure of AfterTriggerSaveEvent with this patch applied looks
pretty weird. IIUC, whenever we enter the block guarded by if
(row_trigger), then either (a) transition_capture != NULL or (b) each
of the four "if" statements inside that block will separately decide
to do nothing. I think now that you're requiring a transition_capture
for all instances where transition tuplestores are used, you could
simplify this code. Maybe just change the outer "if" statement to if
(row_trigger) and then Assert(transition_capture != NULL)?

Yeah, that was silly, and also the comment was misleading. It is
possible for row_trigger to be true and transition_capture to be NULL,
so I did it slightly differently. Fixed in the attached.

Thanks both for the review. New version of patch #2 attached.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

transition-tuples-from-wctes-v3.patchapplication/octet-stream; name=transition-tuples-from-wctes-v3.patchDownload
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5e02b5feba5..043d971fa71 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1416,6 +1416,12 @@ BeginCopy(ParseState *pstate,
 					 errmsg("table \"%s\" does not have OIDs",
 							RelationGetRelationName(cstate->rel))));
 
+		/*
+		 * If there are any triggers with transition tables on the named
+		 * relation, we need to be prepared to capture transition tuples.
+		 */
+		cstate->transition_capture = MakeTransitionCaptureState(rel->trigdesc);
+
 		/* Initialize state for CopyFrom tuple routing. */
 		if (is_from && rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		{
@@ -1440,14 +1446,6 @@ BeginCopy(ParseState *pstate,
 			cstate->partition_tuple_slot = partition_tuple_slot;
 
 			/*
-			 * If there are any triggers with transition tables on the named
-			 * relation, we need to be prepared to capture transition tuples
-			 * from child relations too.
-			 */
-			cstate->transition_capture =
-				MakeTransitionCaptureState(rel->trigdesc);
-
-			/*
 			 * If we are capturing transition tuples, they may need to be
 			 * converted from partition format back to partitioned table
 			 * format (this is only ever necessary if a BEFORE trigger
@@ -2807,7 +2805,7 @@ CopyFrom(CopyState cstate)
 		pq_endmsgread();
 
 	/* Execute AFTER STATEMENT insertion triggers */
-	ExecASInsertTriggers(estate, resultRelInfo);
+	ExecASInsertTriggers(estate, resultRelInfo, cstate->transition_capture);
 
 	/* Handle queued AFTER triggers */
 	AfterTriggerEndQuery(estate);
@@ -2935,7 +2933,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 			cstate->cur_lineno = firstBufferedLineNo + i;
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 bufferedTuples[i],
-								 NIL, NULL);
+								 NIL, cstate->transition_capture);
 		}
 	}
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index ac75ff8f8e4..e0c7c96e63a 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2071,9 +2071,10 @@ FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc)
 /*
  * Make a TransitionCaptureState object from a given TriggerDesc.  The
  * resulting object holds the flags which control whether transition tuples
- * are collected when tables are modified.  This allows us to use the flags
- * from a parent table to control the collection of transition tuples from
- * child tables.
+ * are collected when tables are modified, and the tuplestores themselves.
+ * Note that we copy the flags from a parent table into this struct (rather
+ * than using each relation's TriggerDesc directly) so that we can use it to
+ * control the collection of transition tuples from child tables.
  *
  * If there are no triggers with transition tables configured for 'trigdesc',
  * then return NULL.
@@ -2091,17 +2092,68 @@ MakeTransitionCaptureState(TriggerDesc *trigdesc)
 		(trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table ||
 		 trigdesc->trig_update_new_table || trigdesc->trig_insert_new_table))
 	{
+		MemoryContext oldcxt;
+		ResourceOwner saveResourceOwner;
+
+		/*
+		 * Normally DestroyTransitionCaptureState should be called after
+		 * executing all AFTER triggers for the current statement.
+		 *
+		 * To handle error cleanup, TransitionCaptureState and the tuplestores
+		 * it contains will live in the current [sub]transaction's memory
+		 * context.  Likewise for the current resource owner, because we also
+		 * want to clean up temporary files spilled to disk by the tuplestore
+		 * in that scenario.  This scope is sufficient, because AFTER triggers
+		 * with transition tables cannot be deferred (only constraint triggers
+		 * can be deferred, and constraint triggers cannot have transition
+		 * tables).  The AFTER trigger queue may contain pointers to this
+		 * TransitionCaptureState, but any such entries will be processed or
+		 * discarded before the end of the current [sub]transaction.
+		 *
+		 * If a future release allows deferred triggers with transition
+		 * tables, we'll need to reconsider the scope of the
+		 * TransitionCaptureState object.
+		 */
+		oldcxt = MemoryContextSwitchTo(CurTransactionContext);
+		saveResourceOwner = CurrentResourceOwner;
+
 		state = (TransitionCaptureState *)
 			palloc0(sizeof(TransitionCaptureState));
 		state->tcs_delete_old_table = trigdesc->trig_delete_old_table;
 		state->tcs_update_old_table = trigdesc->trig_update_old_table;
 		state->tcs_update_new_table = trigdesc->trig_update_new_table;
 		state->tcs_insert_new_table = trigdesc->trig_insert_new_table;
+		PG_TRY();
+		{
+			CurrentResourceOwner = CurTransactionResourceOwner;
+			if (trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table)
+				state->tcs_old_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+			if (trigdesc->trig_insert_new_table || trigdesc->trig_update_new_table)
+				state->tcs_new_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+		}
+		PG_CATCH();
+		{
+			CurrentResourceOwner = saveResourceOwner;
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+		CurrentResourceOwner = saveResourceOwner;
+		MemoryContextSwitchTo(oldcxt);
 	}
 
 	return state;
 }
 
+void
+DestroyTransitionCaptureState(TransitionCaptureState *tcs)
+{
+	if (tcs->tcs_new_tuplestore != NULL)
+		tuplestore_end(tcs->tcs_new_tuplestore);
+	if (tcs->tcs_old_tuplestore != NULL)
+		tuplestore_end(tcs->tcs_old_tuplestore);
+	pfree(tcs);
+}
+
 /*
  * Call a trigger function.
  *
@@ -2260,13 +2312,14 @@ ExecBSInsertTriggers(EState *estate, ResultRelInfo *relinfo)
 }
 
 void
-ExecASInsertTriggers(EState *estate, ResultRelInfo *relinfo)
+ExecASInsertTriggers(EState *estate, ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if (trigdesc && trigdesc->trig_insert_after_statement)
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_INSERT,
-							  false, NULL, NULL, NIL, NULL, NULL);
+							  false, NULL, NULL, NIL, NULL, transition_capture);
 }
 
 TupleTableSlot *
@@ -2343,7 +2396,6 @@ ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if ((trigdesc && trigdesc->trig_insert_after_row) ||
-		(trigdesc && !transition_capture && trigdesc->trig_insert_new_table) ||
 		(transition_capture && transition_capture->tcs_insert_new_table))
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_INSERT,
 							  true, NULL, trigtuple,
@@ -2470,13 +2522,14 @@ ExecBSDeleteTriggers(EState *estate, ResultRelInfo *relinfo)
 }
 
 void
-ExecASDeleteTriggers(EState *estate, ResultRelInfo *relinfo)
+ExecASDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if (trigdesc && trigdesc->trig_delete_after_statement)
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_DELETE,
-							  false, NULL, NULL, NIL, NULL, NULL);
+							  false, NULL, NULL, NIL, NULL, transition_capture);
 }
 
 bool
@@ -2557,7 +2610,6 @@ ExecARDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if ((trigdesc && trigdesc->trig_delete_after_row) ||
-		(trigdesc && !transition_capture && trigdesc->trig_delete_old_table) ||
 		(transition_capture && transition_capture->tcs_delete_old_table))
 	{
 		HeapTuple	trigtuple;
@@ -2684,7 +2736,8 @@ ExecBSUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
 }
 
 void
-ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
+ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
@@ -2692,7 +2745,7 @@ ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_UPDATE,
 							  false, NULL, NULL, NIL,
 							  GetUpdatedColumns(relinfo, estate),
-							  NULL);
+							  transition_capture);
 }
 
 TupleTableSlot *
@@ -2823,9 +2876,6 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if ((trigdesc && trigdesc->trig_update_after_row) ||
-		(trigdesc && !transition_capture &&
-		 (trigdesc->trig_update_old_table ||
-		  trigdesc->trig_update_new_table)) ||
 		(transition_capture &&
 		 (transition_capture->tcs_update_old_table ||
 		  transition_capture->tcs_update_new_table)))
@@ -3363,6 +3413,7 @@ typedef struct AfterTriggerSharedData
 	Oid			ats_tgoid;		/* the trigger's ID */
 	Oid			ats_relid;		/* the relation it's on */
 	CommandId	ats_firing_id;	/* ID for firing cycle */
+	TransitionCaptureState *ats_transition_capture;
 } AfterTriggerSharedData;
 
 typedef struct AfterTriggerEventData *AfterTriggerEvent;
@@ -3468,9 +3519,6 @@ typedef struct AfterTriggerEventList
  * fdw_tuplestores[query_depth] is a tuplestore containing the foreign tuples
  * needed for the current query.
  *
- * old_tuplestores[query_depth] and new_tuplestores[query_depth] hold the
- * transition relations for the current query.
- *
  * maxquerydepth is just the allocated length of query_stack and the
  * tuplestores.
  *
@@ -3503,8 +3551,6 @@ typedef struct AfterTriggersData
 	AfterTriggerEventList *query_stack; /* events pending from each query */
 	Tuplestorestate **fdw_tuplestores;	/* foreign tuples for one row from
 										 * each query */
-	Tuplestorestate **old_tuplestores;	/* all old tuples from each query */
-	Tuplestorestate **new_tuplestores;	/* all new tuples from each query */
 	int			maxquerydepth;	/* allocated len of above array */
 	MemoryContext event_cxt;	/* memory context for events, if any */
 
@@ -3525,7 +3571,8 @@ static void AfterTriggerExecute(AfterTriggerEvent event,
 					Instrumentation *instr,
 					MemoryContext per_tuple_context,
 					TupleTableSlot *trig_tuple_slot1,
-					TupleTableSlot *trig_tuple_slot2);
+					TupleTableSlot *trig_tuple_slot2,
+					TransitionCaptureState *transition_capture);
 static SetConstraintState SetConstraintStateCreate(int numalloc);
 static SetConstraintState SetConstraintStateCopy(SetConstraintState state);
 static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
@@ -3534,8 +3581,6 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
 
 /*
  * Gets a current query transition tuplestore and initializes it if necessary.
- * This can be holding a single transition row tuple (in the case of an FDW)
- * or a transition table (for an AFTER trigger).
  */
 static Tuplestorestate *
 GetTriggerTransitionTuplestore(Tuplestorestate **tss)
@@ -3715,6 +3760,7 @@ afterTriggerAddEvent(AfterTriggerEventList *events,
 		if (newshared->ats_tgoid == evtshared->ats_tgoid &&
 			newshared->ats_relid == evtshared->ats_relid &&
 			newshared->ats_event == evtshared->ats_event &&
+			newshared->ats_transition_capture == evtshared->ats_transition_capture &&
 			newshared->ats_firing_id == 0)
 			break;
 	}
@@ -3826,7 +3872,8 @@ AfterTriggerExecute(AfterTriggerEvent event,
 					FmgrInfo *finfo, Instrumentation *instr,
 					MemoryContext per_tuple_context,
 					TupleTableSlot *trig_tuple_slot1,
-					TupleTableSlot *trig_tuple_slot2)
+					TupleTableSlot *trig_tuple_slot2,
+					TransitionCaptureState *transition_capture)
 {
 	AfterTriggerShared evtshared = GetTriggerSharedData(event);
 	Oid			tgoid = evtshared->ats_tgoid;
@@ -3941,16 +3988,14 @@ AfterTriggerExecute(AfterTriggerEvent event,
 	/*
 	 * Set up the tuplestore information.
 	 */
-	if (LocTriggerData.tg_trigger->tgoldtable)
-		LocTriggerData.tg_oldtable =
-			GetTriggerTransitionTuplestore(afterTriggers.old_tuplestores);
-	else
-		LocTriggerData.tg_oldtable = NULL;
-	if (LocTriggerData.tg_trigger->tgnewtable)
-		LocTriggerData.tg_newtable =
-			GetTriggerTransitionTuplestore(afterTriggers.new_tuplestores);
-	else
-		LocTriggerData.tg_newtable = NULL;
+	LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL;
+	if (transition_capture != NULL)
+	{
+		if (LocTriggerData.tg_trigger->tgoldtable)
+			LocTriggerData.tg_oldtable = transition_capture->tcs_old_tuplestore;
+		if (LocTriggerData.tg_trigger->tgnewtable)
+			LocTriggerData.tg_newtable = transition_capture->tcs_new_tuplestore;
+	}
 
 	/*
 	 * Setup the remaining trigger information
@@ -4158,7 +4203,8 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 				 * won't try to re-fire it.
 				 */
 				AfterTriggerExecute(event, rel, trigdesc, finfo, instr,
-									per_tuple_context, slot1, slot2);
+									per_tuple_context, slot1, slot2,
+									evtshared->ats_transition_capture);
 
 				/*
 				 * Mark the event as done.
@@ -4232,8 +4278,6 @@ AfterTriggerBeginXact(void)
 	Assert(afterTriggers.state == NULL);
 	Assert(afterTriggers.query_stack == NULL);
 	Assert(afterTriggers.fdw_tuplestores == NULL);
-	Assert(afterTriggers.old_tuplestores == NULL);
-	Assert(afterTriggers.new_tuplestores == NULL);
 	Assert(afterTriggers.maxquerydepth == 0);
 	Assert(afterTriggers.event_cxt == NULL);
 	Assert(afterTriggers.events.head == NULL);
@@ -4278,8 +4322,6 @@ AfterTriggerEndQuery(EState *estate)
 {
 	AfterTriggerEventList *events;
 	Tuplestorestate *fdw_tuplestore;
-	Tuplestorestate *old_tuplestore;
-	Tuplestorestate *new_tuplestore;
 
 	/* Must be inside a query, too */
 	Assert(afterTriggers.query_depth >= 0);
@@ -4338,18 +4380,6 @@ AfterTriggerEndQuery(EState *estate)
 		tuplestore_end(fdw_tuplestore);
 		afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
 	}
-	old_tuplestore = afterTriggers.old_tuplestores[afterTriggers.query_depth];
-	if (old_tuplestore)
-	{
-		tuplestore_end(old_tuplestore);
-		afterTriggers.old_tuplestores[afterTriggers.query_depth] = NULL;
-	}
-	new_tuplestore = afterTriggers.new_tuplestores[afterTriggers.query_depth];
-	if (new_tuplestore)
-	{
-		tuplestore_end(new_tuplestore);
-		afterTriggers.new_tuplestores[afterTriggers.query_depth] = NULL;
-	}
 	afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);
 
 	afterTriggers.query_depth--;
@@ -4463,8 +4493,6 @@ AfterTriggerEndXact(bool isCommit)
 	 */
 	afterTriggers.query_stack = NULL;
 	afterTriggers.fdw_tuplestores = NULL;
-	afterTriggers.old_tuplestores = NULL;
-	afterTriggers.new_tuplestores = NULL;
 	afterTriggers.maxquerydepth = 0;
 	afterTriggers.state = NULL;
 
@@ -4597,18 +4625,6 @@ AfterTriggerEndSubXact(bool isCommit)
 					tuplestore_end(ts);
 					afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
 				}
-				ts = afterTriggers.old_tuplestores[afterTriggers.query_depth];
-				if (ts)
-				{
-					tuplestore_end(ts);
-					afterTriggers.old_tuplestores[afterTriggers.query_depth] = NULL;
-				}
-				ts = afterTriggers.new_tuplestores[afterTriggers.query_depth];
-				if (ts)
-				{
-					tuplestore_end(ts);
-					afterTriggers.new_tuplestores[afterTriggers.query_depth] = NULL;
-				}
 
 				afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);
 			}
@@ -4688,12 +4704,6 @@ AfterTriggerEnlargeQueryState(void)
 		afterTriggers.fdw_tuplestores = (Tuplestorestate **)
 			MemoryContextAllocZero(TopTransactionContext,
 								   new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.old_tuplestores = (Tuplestorestate **)
-			MemoryContextAllocZero(TopTransactionContext,
-								   new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.new_tuplestores = (Tuplestorestate **)
-			MemoryContextAllocZero(TopTransactionContext,
-								   new_alloc * sizeof(Tuplestorestate *));
 		afterTriggers.maxquerydepth = new_alloc;
 	}
 	else
@@ -4709,19 +4719,9 @@ AfterTriggerEnlargeQueryState(void)
 		afterTriggers.fdw_tuplestores = (Tuplestorestate **)
 			repalloc(afterTriggers.fdw_tuplestores,
 					 new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.old_tuplestores = (Tuplestorestate **)
-			repalloc(afterTriggers.old_tuplestores,
-					 new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.new_tuplestores = (Tuplestorestate **)
-			repalloc(afterTriggers.new_tuplestores,
-					 new_alloc * sizeof(Tuplestorestate *));
 		/* Clear newly-allocated slots for subsequent lazy initialization. */
 		memset(afterTriggers.fdw_tuplestores + old_alloc,
 			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
-		memset(afterTriggers.old_tuplestores + old_alloc,
-			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
-		memset(afterTriggers.new_tuplestores + old_alloc,
-			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
 		afterTriggers.maxquerydepth = new_alloc;
 	}
 
@@ -5206,51 +5206,17 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 		AfterTriggerEnlargeQueryState();
 
 	/*
-	 * If the relation has AFTER ... FOR EACH ROW triggers, capture rows into
-	 * transition tuplestores for this depth.
+	 * If the directly named relation has any triggers with transition tables,
+	 * then we need to capture transition tuples.
 	 */
-	if (row_trigger)
+	if (row_trigger && transition_capture != NULL)
 	{
-		HeapTuple original_insert_tuple = NULL;
-		TupleConversionMap *map = NULL;
-		bool delete_old_table = false;
-		bool update_old_table = false;
-		bool update_new_table = false;
-		bool insert_new_table = false;
-
-		if (transition_capture != NULL)
-		{
-			/*
-			 * A TransitionCaptureState object was provided to tell us which
-			 * tuples to capture based on a parent table named in a DML
-			 * statement.  We may be dealing with a child table with an
-			 * incompatible TupleDescriptor, in which case we'll need a map to
-			 * convert them.  As a small optimization, we may receive the
-			 * original tuple from an insertion into a partitioned table to
-			 * avoid a wasteful parent->child->parent round trip.
-			 */
-			delete_old_table = transition_capture->tcs_delete_old_table;
-			update_old_table = transition_capture->tcs_update_old_table;
-			update_new_table = transition_capture->tcs_update_new_table;
-			insert_new_table = transition_capture->tcs_insert_new_table;
-			map = transition_capture->tcs_map;
-			original_insert_tuple =
-				transition_capture->tcs_original_insert_tuple;
-		}
-		else if (trigdesc != NULL)
-		{
-			/*
-			 * Check if we need to capture transition tuples for triggers
-			 * defined on this relation directly.  This case is useful for
-			 * cases like execReplication.c which don't set up a
-			 * TriggerCaptureState because they don't know how to work with
-			 * partitions.
-			 */
-			delete_old_table = trigdesc->trig_delete_old_table;
-			update_old_table = trigdesc->trig_update_old_table;
-			update_new_table = trigdesc->trig_update_new_table;
-			insert_new_table = trigdesc->trig_insert_new_table;
-		}
+		HeapTuple original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		TupleConversionMap *map = transition_capture->tcs_map;
+		bool delete_old_table = transition_capture->tcs_delete_old_table;
+		bool update_old_table = transition_capture->tcs_update_old_table;
+		bool update_new_table = transition_capture->tcs_update_new_table;
+		bool insert_new_table = transition_capture->tcs_insert_new_table;;
 
 		if ((event == TRIGGER_EVENT_DELETE && delete_old_table) ||
 			(event == TRIGGER_EVENT_UPDATE && update_old_table))
@@ -5258,9 +5224,8 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 			Tuplestorestate *old_tuplestore;
 
 			Assert(oldtup != NULL);
-			old_tuplestore =
-				GetTriggerTransitionTuplestore
-				(afterTriggers.old_tuplestores);
+			old_tuplestore = transition_capture->tcs_old_tuplestore;
+
 			if (map != NULL)
 			{
 				HeapTuple	converted = do_convert_tuple(oldtup, map);
@@ -5277,9 +5242,8 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 			Tuplestorestate *new_tuplestore;
 
 			Assert(newtup != NULL);
-			new_tuplestore =
-				GetTriggerTransitionTuplestore
-				(afterTriggers.new_tuplestores);
+			new_tuplestore = transition_capture->tcs_new_tuplestore;
+
 			if (original_insert_tuple != NULL)
 				tuplestore_puttuple(new_tuplestore, original_insert_tuple);
 			else if (map != NULL)
@@ -5465,6 +5429,7 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 		new_shared.ats_tgoid = trigger->tgoid;
 		new_shared.ats_relid = RelationGetRelid(rel);
 		new_shared.ats_firing_id = 0;
+		new_shared.ats_transition_capture = transition_capture;
 
 		afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth],
 							 &new_event, &new_shared);
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 0a98fbfdf54..24586439de9 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -406,6 +406,12 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		ExecARInsertTriggers(estate, resultRelInfo, tuple,
 							 recheckIndexes, NULL);
 
+		/*
+		 * XXX we should in theory pass a TransitionCaptureState object to the
+		 * above to capture transition tuples, but after statement triggers
+		 * don't actually get fired by replication yet anyway
+		 */
+
 		list_free(recheckIndexes);
 	}
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index ebfc8139978..ece929b9a5d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1442,14 +1442,18 @@ fireASTriggers(ModifyTableState *node)
 		case CMD_INSERT:
 			if (node->mt_onconflict == ONCONFLICT_UPDATE)
 				ExecASUpdateTriggers(node->ps.state,
-									 resultRelInfo);
-			ExecASInsertTriggers(node->ps.state, resultRelInfo);
+									 resultRelInfo,
+									 node->mt_transition_capture);
+			ExecASInsertTriggers(node->ps.state, resultRelInfo,
+								 node->mt_transition_capture);
 			break;
 		case CMD_UPDATE:
-			ExecASUpdateTriggers(node->ps.state, resultRelInfo);
+			ExecASUpdateTriggers(node->ps.state, resultRelInfo,
+								 node->mt_transition_capture);
 			break;
 		case CMD_DELETE:
-			ExecASDeleteTriggers(node->ps.state, resultRelInfo);
+			ExecASDeleteTriggers(node->ps.state, resultRelInfo,
+								 node->mt_transition_capture);
 			break;
 		default:
 			elog(ERROR, "unknown operation");
@@ -2294,6 +2298,10 @@ ExecEndModifyTable(ModifyTableState *node)
 {
 	int			i;
 
+	/* Free transition tables */
+	if (node->mt_transition_capture != NULL)
+		DestroyTransitionCaptureState(node->mt_transition_capture);
+
 	/*
 	 * Allow any FDWs to shut down
 	 */
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 736f5de67dd..5ce57966245 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -42,8 +42,8 @@ typedef struct TriggerData
 } TriggerData;
 
 /*
- * Meta-data to control the capture of old and new tuples into transition
- * tables from child tables.
+ * The state for capturing old and new tuples into transition tables for a
+ * single ModifyTable node.
  */
 typedef struct TransitionCaptureState
 {
@@ -72,6 +72,10 @@ typedef struct TransitionCaptureState
 	 * the original tuple directly.
 	 */
 	HeapTuple	tcs_original_insert_tuple;
+
+	/* The tuplestores backing the transition tables. */
+	Tuplestorestate *tcs_old_tuplestore;
+	Tuplestorestate *tcs_new_tuplestore;
 } TransitionCaptureState;
 
 /*
@@ -162,13 +166,15 @@ extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc);
 
 extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc);
 extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc);
+extern void DestroyTransitionCaptureState(TransitionCaptureState *tcs);
 
 extern void FreeTriggerDesc(TriggerDesc *trigdesc);
 
 extern void ExecBSInsertTriggers(EState *estate,
 					 ResultRelInfo *relinfo);
 extern void ExecASInsertTriggers(EState *estate,
-					 ResultRelInfo *relinfo);
+					 ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture);
 extern TupleTableSlot *ExecBRInsertTriggers(EState *estate,
 					 ResultRelInfo *relinfo,
 					 TupleTableSlot *slot);
@@ -183,7 +189,8 @@ extern TupleTableSlot *ExecIRInsertTriggers(EState *estate,
 extern void ExecBSDeleteTriggers(EState *estate,
 					 ResultRelInfo *relinfo);
 extern void ExecASDeleteTriggers(EState *estate,
-					 ResultRelInfo *relinfo);
+					 ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture);
 extern bool ExecBRDeleteTriggers(EState *estate,
 					 EPQState *epqstate,
 					 ResultRelInfo *relinfo,
@@ -200,7 +207,8 @@ extern bool ExecIRDeleteTriggers(EState *estate,
 extern void ExecBSUpdateTriggers(EState *estate,
 					 ResultRelInfo *relinfo);
 extern void ExecASUpdateTriggers(EState *estate,
-					 ResultRelInfo *relinfo);
+					 ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture);
 extern TupleTableSlot *ExecBRUpdateTriggers(EState *estate,
 					 EPQState *epqstate,
 					 ResultRelInfo *relinfo,
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 995410f1aae..0261d66e5c5 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2194,6 +2194,26 @@ DETAIL:  ROW triggers with transition tables are not supported in inheritance hi
 drop trigger child_row_trig on child;
 alter table child inherit parent;
 drop table child, parent;
+--
+-- Verify behavior of queries with wCTEs, where multiple transition
+-- tuplestores can be active at the same time because there are
+-- multiple DML statements that might fire triggers with transition
+-- tables
+--
+create table table1 (a int);
+create table table2 (a text);
+create trigger table1_trig
+  after insert on table1 referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger table2_trig
+  after insert on table2 referencing new table as new_table
+  for each statement execute procedure dump_insert();
+with wcte as (insert into table1 values (42))
+  insert into table2 values ('hello world');
+NOTICE:  trigger = table2_trig, new table = ("hello world")
+NOTICE:  trigger = table1_trig, new table = (42)
+drop table table1;
+drop table table2;
 -- cleanup
 drop function dump_insert();
 drop function dump_update();
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 683a5f1e5c4..128126a0f1a 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1704,6 +1704,27 @@ alter table child inherit parent;
 
 drop table child, parent;
 
+--
+-- Verify behavior of queries with wCTEs, where multiple transition
+-- tuplestores can be active at the same time because there are
+-- multiple DML statements that might fire triggers with transition
+-- tables
+--
+create table table1 (a int);
+create table table2 (a text);
+create trigger table1_trig
+  after insert on table1 referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger table2_trig
+  after insert on table2 referencing new table as new_table
+  for each statement execute procedure dump_insert();
+
+with wcte as (insert into table1 values (42))
+  insert into table2 values ('hello world');
+
+drop table table1;
+drop table table2;
+
 -- cleanup
 drop function dump_insert();
 drop function dump_update();
#58Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Thomas Munro (#57)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

"Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:

Thomas> Thanks both for the review. New version of patch #2 attached.

I'm looking to commit this soon; if anyone has any further comment now
would be a good time to speak up.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Andrew Gierth (#58)
1 attachment(s)
Re: PG10 transition tables, wCTEs and multiple operations on the same table

On Mon, Jun 19, 2017 at 11:06 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

"Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:

Thomas> Thanks both for the review. New version of patch #2 attached.

I'm looking to commit this soon; if anyone has any further comment now
would be a good time to speak up.

Here's patch #2 rebased for the recent reindent.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

transition-tuples-from-wctes-v4.patchapplication/octet-stream; name=transition-tuples-from-wctes-v4.patchDownload
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a4c02e6b7c5..f391828e74f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1416,6 +1416,12 @@ BeginCopy(ParseState *pstate,
 					 errmsg("table \"%s\" does not have OIDs",
 							RelationGetRelationName(cstate->rel))));
 
+		/*
+		 * If there are any triggers with transition tables on the named
+		 * relation, we need to be prepared to capture transition tuples.
+		 */
+		cstate->transition_capture = MakeTransitionCaptureState(rel->trigdesc);
+
 		/* Initialize state for CopyFrom tuple routing. */
 		if (is_from && rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		{
@@ -1440,14 +1446,6 @@ BeginCopy(ParseState *pstate,
 			cstate->partition_tuple_slot = partition_tuple_slot;
 
 			/*
-			 * If there are any triggers with transition tables on the named
-			 * relation, we need to be prepared to capture transition tuples
-			 * from child relations too.
-			 */
-			cstate->transition_capture =
-				MakeTransitionCaptureState(rel->trigdesc);
-
-			/*
 			 * If we are capturing transition tuples, they may need to be
 			 * converted from partition format back to partitioned table
 			 * format (this is only ever necessary if a BEFORE trigger
@@ -2807,7 +2805,7 @@ CopyFrom(CopyState cstate)
 		pq_endmsgread();
 
 	/* Execute AFTER STATEMENT insertion triggers */
-	ExecASInsertTriggers(estate, resultRelInfo);
+	ExecASInsertTriggers(estate, resultRelInfo, cstate->transition_capture);
 
 	/* Handle queued AFTER triggers */
 	AfterTriggerEndQuery(estate);
@@ -2935,7 +2933,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 			cstate->cur_lineno = firstBufferedLineNo + i;
 			ExecARInsertTriggers(estate, resultRelInfo,
 								 bufferedTuples[i],
-								 NIL, NULL);
+								 NIL, cstate->transition_capture);
 		}
 	}
 
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index f902e0cdf5f..54db16c9090 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2071,9 +2071,10 @@ FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc)
 /*
  * Make a TransitionCaptureState object from a given TriggerDesc.  The
  * resulting object holds the flags which control whether transition tuples
- * are collected when tables are modified.  This allows us to use the flags
- * from a parent table to control the collection of transition tuples from
- * child tables.
+ * are collected when tables are modified, and the tuplestores themselves.
+ * Note that we copy the flags from a parent table into this struct (rather
+ * than using each relation's TriggerDesc directly) so that we can use it to
+ * control the collection of transition tuples from child tables.
  *
  * If there are no triggers with transition tables configured for 'trigdesc',
  * then return NULL.
@@ -2091,17 +2092,68 @@ MakeTransitionCaptureState(TriggerDesc *trigdesc)
 		(trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table ||
 		 trigdesc->trig_update_new_table || trigdesc->trig_insert_new_table))
 	{
+		MemoryContext oldcxt;
+		ResourceOwner saveResourceOwner;
+
+		/*
+		 * Normally DestroyTransitionCaptureState should be called after
+		 * executing all AFTER triggers for the current statement.
+		 *
+		 * To handle error cleanup, TransitionCaptureState and the tuplestores
+		 * it contains will live in the current [sub]transaction's memory
+		 * context.  Likewise for the current resource owner, because we also
+		 * want to clean up temporary files spilled to disk by the tuplestore
+		 * in that scenario.  This scope is sufficient, because AFTER triggers
+		 * with transition tables cannot be deferred (only constraint triggers
+		 * can be deferred, and constraint triggers cannot have transition
+		 * tables).  The AFTER trigger queue may contain pointers to this
+		 * TransitionCaptureState, but any such entries will be processed or
+		 * discarded before the end of the current [sub]transaction.
+		 *
+		 * If a future release allows deferred triggers with transition
+		 * tables, we'll need to reconsider the scope of the
+		 * TransitionCaptureState object.
+		 */
+		oldcxt = MemoryContextSwitchTo(CurTransactionContext);
+		saveResourceOwner = CurrentResourceOwner;
+
 		state = (TransitionCaptureState *)
 			palloc0(sizeof(TransitionCaptureState));
 		state->tcs_delete_old_table = trigdesc->trig_delete_old_table;
 		state->tcs_update_old_table = trigdesc->trig_update_old_table;
 		state->tcs_update_new_table = trigdesc->trig_update_new_table;
 		state->tcs_insert_new_table = trigdesc->trig_insert_new_table;
+		PG_TRY();
+		{
+			CurrentResourceOwner = CurTransactionResourceOwner;
+			if (trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table)
+				state->tcs_old_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+			if (trigdesc->trig_insert_new_table || trigdesc->trig_update_new_table)
+				state->tcs_new_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+		}
+		PG_CATCH();
+		{
+			CurrentResourceOwner = saveResourceOwner;
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+		CurrentResourceOwner = saveResourceOwner;
+		MemoryContextSwitchTo(oldcxt);
 	}
 
 	return state;
 }
 
+void
+DestroyTransitionCaptureState(TransitionCaptureState *tcs)
+{
+	if (tcs->tcs_new_tuplestore != NULL)
+		tuplestore_end(tcs->tcs_new_tuplestore);
+	if (tcs->tcs_old_tuplestore != NULL)
+		tuplestore_end(tcs->tcs_old_tuplestore);
+	pfree(tcs);
+}
+
 /*
  * Call a trigger function.
  *
@@ -2260,13 +2312,14 @@ ExecBSInsertTriggers(EState *estate, ResultRelInfo *relinfo)
 }
 
 void
-ExecASInsertTriggers(EState *estate, ResultRelInfo *relinfo)
+ExecASInsertTriggers(EState *estate, ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if (trigdesc && trigdesc->trig_insert_after_statement)
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_INSERT,
-							  false, NULL, NULL, NIL, NULL, NULL);
+							  false, NULL, NULL, NIL, NULL, transition_capture);
 }
 
 TupleTableSlot *
@@ -2343,7 +2396,6 @@ ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo,
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if ((trigdesc && trigdesc->trig_insert_after_row) ||
-		(trigdesc && !transition_capture && trigdesc->trig_insert_new_table) ||
 		(transition_capture && transition_capture->tcs_insert_new_table))
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_INSERT,
 							  true, NULL, trigtuple,
@@ -2470,13 +2522,14 @@ ExecBSDeleteTriggers(EState *estate, ResultRelInfo *relinfo)
 }
 
 void
-ExecASDeleteTriggers(EState *estate, ResultRelInfo *relinfo)
+ExecASDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if (trigdesc && trigdesc->trig_delete_after_statement)
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_DELETE,
-							  false, NULL, NULL, NIL, NULL, NULL);
+							  false, NULL, NULL, NIL, NULL, transition_capture);
 }
 
 bool
@@ -2557,7 +2610,6 @@ ExecARDeleteTriggers(EState *estate, ResultRelInfo *relinfo,
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if ((trigdesc && trigdesc->trig_delete_after_row) ||
-		(trigdesc && !transition_capture && trigdesc->trig_delete_old_table) ||
 		(transition_capture && transition_capture->tcs_delete_old_table))
 	{
 		HeapTuple	trigtuple;
@@ -2684,7 +2736,8 @@ ExecBSUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
 }
 
 void
-ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
+ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture)
 {
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
@@ -2692,7 +2745,7 @@ ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
 		AfterTriggerSaveEvent(estate, relinfo, TRIGGER_EVENT_UPDATE,
 							  false, NULL, NULL, NIL,
 							  GetUpdatedColumns(relinfo, estate),
-							  NULL);
+							  transition_capture);
 }
 
 TupleTableSlot *
@@ -2823,9 +2876,6 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 	TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
 
 	if ((trigdesc && trigdesc->trig_update_after_row) ||
-		(trigdesc && !transition_capture &&
-		 (trigdesc->trig_update_old_table ||
-		  trigdesc->trig_update_new_table)) ||
 		(transition_capture &&
 		 (transition_capture->tcs_update_old_table ||
 		  transition_capture->tcs_update_new_table)))
@@ -3362,6 +3412,7 @@ typedef struct AfterTriggerSharedData
 	Oid			ats_tgoid;		/* the trigger's ID */
 	Oid			ats_relid;		/* the relation it's on */
 	CommandId	ats_firing_id;	/* ID for firing cycle */
+	TransitionCaptureState *ats_transition_capture;
 } AfterTriggerSharedData;
 
 typedef struct AfterTriggerEventData *AfterTriggerEvent;
@@ -3467,9 +3518,6 @@ typedef struct AfterTriggerEventList
  * fdw_tuplestores[query_depth] is a tuplestore containing the foreign tuples
  * needed for the current query.
  *
- * old_tuplestores[query_depth] and new_tuplestores[query_depth] hold the
- * transition relations for the current query.
- *
  * maxquerydepth is just the allocated length of query_stack and the
  * tuplestores.
  *
@@ -3502,8 +3550,6 @@ typedef struct AfterTriggersData
 	AfterTriggerEventList *query_stack; /* events pending from each query */
 	Tuplestorestate **fdw_tuplestores;	/* foreign tuples for one row from
 										 * each query */
-	Tuplestorestate **old_tuplestores;	/* all old tuples from each query */
-	Tuplestorestate **new_tuplestores;	/* all new tuples from each query */
 	int			maxquerydepth;	/* allocated len of above array */
 	MemoryContext event_cxt;	/* memory context for events, if any */
 
@@ -3524,7 +3570,8 @@ static void AfterTriggerExecute(AfterTriggerEvent event,
 					Instrumentation *instr,
 					MemoryContext per_tuple_context,
 					TupleTableSlot *trig_tuple_slot1,
-					TupleTableSlot *trig_tuple_slot2);
+					TupleTableSlot *trig_tuple_slot2,
+					TransitionCaptureState *transition_capture);
 static SetConstraintState SetConstraintStateCreate(int numalloc);
 static SetConstraintState SetConstraintStateCopy(SetConstraintState state);
 static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
@@ -3533,8 +3580,6 @@ static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
 
 /*
  * Gets a current query transition tuplestore and initializes it if necessary.
- * This can be holding a single transition row tuple (in the case of an FDW)
- * or a transition table (for an AFTER trigger).
  */
 static Tuplestorestate *
 GetTriggerTransitionTuplestore(Tuplestorestate **tss)
@@ -3714,6 +3759,7 @@ afterTriggerAddEvent(AfterTriggerEventList *events,
 		if (newshared->ats_tgoid == evtshared->ats_tgoid &&
 			newshared->ats_relid == evtshared->ats_relid &&
 			newshared->ats_event == evtshared->ats_event &&
+			newshared->ats_transition_capture == evtshared->ats_transition_capture &&
 			newshared->ats_firing_id == 0)
 			break;
 	}
@@ -3825,7 +3871,8 @@ AfterTriggerExecute(AfterTriggerEvent event,
 					FmgrInfo *finfo, Instrumentation *instr,
 					MemoryContext per_tuple_context,
 					TupleTableSlot *trig_tuple_slot1,
-					TupleTableSlot *trig_tuple_slot2)
+					TupleTableSlot *trig_tuple_slot2,
+					TransitionCaptureState *transition_capture)
 {
 	AfterTriggerShared evtshared = GetTriggerSharedData(event);
 	Oid			tgoid = evtshared->ats_tgoid;
@@ -3940,16 +3987,14 @@ AfterTriggerExecute(AfterTriggerEvent event,
 	/*
 	 * Set up the tuplestore information.
 	 */
-	if (LocTriggerData.tg_trigger->tgoldtable)
-		LocTriggerData.tg_oldtable =
-			GetTriggerTransitionTuplestore(afterTriggers.old_tuplestores);
-	else
-		LocTriggerData.tg_oldtable = NULL;
-	if (LocTriggerData.tg_trigger->tgnewtable)
-		LocTriggerData.tg_newtable =
-			GetTriggerTransitionTuplestore(afterTriggers.new_tuplestores);
-	else
-		LocTriggerData.tg_newtable = NULL;
+	LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL;
+	if (transition_capture != NULL)
+	{
+		if (LocTriggerData.tg_trigger->tgoldtable)
+			LocTriggerData.tg_oldtable = transition_capture->tcs_old_tuplestore;
+		if (LocTriggerData.tg_trigger->tgnewtable)
+			LocTriggerData.tg_newtable = transition_capture->tcs_new_tuplestore;
+	}
 
 	/*
 	 * Setup the remaining trigger information
@@ -4157,7 +4202,8 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 				 * won't try to re-fire it.
 				 */
 				AfterTriggerExecute(event, rel, trigdesc, finfo, instr,
-									per_tuple_context, slot1, slot2);
+									per_tuple_context, slot1, slot2,
+									evtshared->ats_transition_capture);
 
 				/*
 				 * Mark the event as done.
@@ -4231,8 +4277,6 @@ AfterTriggerBeginXact(void)
 	Assert(afterTriggers.state == NULL);
 	Assert(afterTriggers.query_stack == NULL);
 	Assert(afterTriggers.fdw_tuplestores == NULL);
-	Assert(afterTriggers.old_tuplestores == NULL);
-	Assert(afterTriggers.new_tuplestores == NULL);
 	Assert(afterTriggers.maxquerydepth == 0);
 	Assert(afterTriggers.event_cxt == NULL);
 	Assert(afterTriggers.events.head == NULL);
@@ -4277,8 +4321,6 @@ AfterTriggerEndQuery(EState *estate)
 {
 	AfterTriggerEventList *events;
 	Tuplestorestate *fdw_tuplestore;
-	Tuplestorestate *old_tuplestore;
-	Tuplestorestate *new_tuplestore;
 
 	/* Must be inside a query, too */
 	Assert(afterTriggers.query_depth >= 0);
@@ -4337,18 +4379,6 @@ AfterTriggerEndQuery(EState *estate)
 		tuplestore_end(fdw_tuplestore);
 		afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
 	}
-	old_tuplestore = afterTriggers.old_tuplestores[afterTriggers.query_depth];
-	if (old_tuplestore)
-	{
-		tuplestore_end(old_tuplestore);
-		afterTriggers.old_tuplestores[afterTriggers.query_depth] = NULL;
-	}
-	new_tuplestore = afterTriggers.new_tuplestores[afterTriggers.query_depth];
-	if (new_tuplestore)
-	{
-		tuplestore_end(new_tuplestore);
-		afterTriggers.new_tuplestores[afterTriggers.query_depth] = NULL;
-	}
 	afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);
 
 	afterTriggers.query_depth--;
@@ -4462,8 +4492,6 @@ AfterTriggerEndXact(bool isCommit)
 	 */
 	afterTriggers.query_stack = NULL;
 	afterTriggers.fdw_tuplestores = NULL;
-	afterTriggers.old_tuplestores = NULL;
-	afterTriggers.new_tuplestores = NULL;
 	afterTriggers.maxquerydepth = 0;
 	afterTriggers.state = NULL;
 
@@ -4596,18 +4624,6 @@ AfterTriggerEndSubXact(bool isCommit)
 					tuplestore_end(ts);
 					afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
 				}
-				ts = afterTriggers.old_tuplestores[afterTriggers.query_depth];
-				if (ts)
-				{
-					tuplestore_end(ts);
-					afterTriggers.old_tuplestores[afterTriggers.query_depth] = NULL;
-				}
-				ts = afterTriggers.new_tuplestores[afterTriggers.query_depth];
-				if (ts)
-				{
-					tuplestore_end(ts);
-					afterTriggers.new_tuplestores[afterTriggers.query_depth] = NULL;
-				}
 
 				afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);
 			}
@@ -4687,12 +4703,6 @@ AfterTriggerEnlargeQueryState(void)
 		afterTriggers.fdw_tuplestores = (Tuplestorestate **)
 			MemoryContextAllocZero(TopTransactionContext,
 								   new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.old_tuplestores = (Tuplestorestate **)
-			MemoryContextAllocZero(TopTransactionContext,
-								   new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.new_tuplestores = (Tuplestorestate **)
-			MemoryContextAllocZero(TopTransactionContext,
-								   new_alloc * sizeof(Tuplestorestate *));
 		afterTriggers.maxquerydepth = new_alloc;
 	}
 	else
@@ -4708,19 +4718,9 @@ AfterTriggerEnlargeQueryState(void)
 		afterTriggers.fdw_tuplestores = (Tuplestorestate **)
 			repalloc(afterTriggers.fdw_tuplestores,
 					 new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.old_tuplestores = (Tuplestorestate **)
-			repalloc(afterTriggers.old_tuplestores,
-					 new_alloc * sizeof(Tuplestorestate *));
-		afterTriggers.new_tuplestores = (Tuplestorestate **)
-			repalloc(afterTriggers.new_tuplestores,
-					 new_alloc * sizeof(Tuplestorestate *));
 		/* Clear newly-allocated slots for subsequent lazy initialization. */
 		memset(afterTriggers.fdw_tuplestores + old_alloc,
 			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
-		memset(afterTriggers.old_tuplestores + old_alloc,
-			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
-		memset(afterTriggers.new_tuplestores + old_alloc,
-			   0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
 		afterTriggers.maxquerydepth = new_alloc;
 	}
 
@@ -5205,51 +5205,17 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 		AfterTriggerEnlargeQueryState();
 
 	/*
-	 * If the relation has AFTER ... FOR EACH ROW triggers, capture rows into
-	 * transition tuplestores for this depth.
+	 * If the directly named relation has any triggers with transition tables,
+	 * then we need to capture transition tuples.
 	 */
-	if (row_trigger)
+	if (row_trigger && transition_capture != NULL)
 	{
-		HeapTuple original_insert_tuple = NULL;
-		TupleConversionMap *map = NULL;
-		bool delete_old_table = false;
-		bool update_old_table = false;
-		bool update_new_table = false;
-		bool insert_new_table = false;
-
-		if (transition_capture != NULL)
-		{
-			/*
-			 * A TransitionCaptureState object was provided to tell us which
-			 * tuples to capture based on a parent table named in a DML
-			 * statement.  We may be dealing with a child table with an
-			 * incompatible TupleDescriptor, in which case we'll need a map to
-			 * convert them.  As a small optimization, we may receive the
-			 * original tuple from an insertion into a partitioned table to
-			 * avoid a wasteful parent->child->parent round trip.
-			 */
-			delete_old_table = transition_capture->tcs_delete_old_table;
-			update_old_table = transition_capture->tcs_update_old_table;
-			update_new_table = transition_capture->tcs_update_new_table;
-			insert_new_table = transition_capture->tcs_insert_new_table;
-			map = transition_capture->tcs_map;
-			original_insert_tuple =
-				transition_capture->tcs_original_insert_tuple;
-		}
-		else if (trigdesc != NULL)
-		{
-			/*
-			 * Check if we need to capture transition tuples for triggers
-			 * defined on this relation directly.  This case is useful for
-			 * cases like execReplication.c which don't set up a
-			 * TriggerCaptureState because they don't know how to work with
-			 * partitions.
-			 */
-			delete_old_table = trigdesc->trig_delete_old_table;
-			update_old_table = trigdesc->trig_update_old_table;
-			update_new_table = trigdesc->trig_update_new_table;
-			insert_new_table = trigdesc->trig_insert_new_table;
-		}
+		HeapTuple original_insert_tuple = transition_capture->tcs_original_insert_tuple;
+		TupleConversionMap *map = transition_capture->tcs_map;
+		bool delete_old_table = transition_capture->tcs_delete_old_table;
+		bool update_old_table = transition_capture->tcs_update_old_table;
+		bool update_new_table = transition_capture->tcs_update_new_table;
+		bool insert_new_table = transition_capture->tcs_insert_new_table;;
 
 		if ((event == TRIGGER_EVENT_DELETE && delete_old_table) ||
 			(event == TRIGGER_EVENT_UPDATE && update_old_table))
@@ -5257,9 +5223,8 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 			Tuplestorestate *old_tuplestore;
 
 			Assert(oldtup != NULL);
-			old_tuplestore =
-				GetTriggerTransitionTuplestore
-				(afterTriggers.old_tuplestores);
+			old_tuplestore = transition_capture->tcs_old_tuplestore;
+
 			if (map != NULL)
 			{
 				HeapTuple	converted = do_convert_tuple(oldtup, map);
@@ -5276,9 +5241,8 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 			Tuplestorestate *new_tuplestore;
 
 			Assert(newtup != NULL);
-			new_tuplestore =
-				GetTriggerTransitionTuplestore
-				(afterTriggers.new_tuplestores);
+			new_tuplestore = transition_capture->tcs_new_tuplestore;
+
 			if (original_insert_tuple != NULL)
 				tuplestore_puttuple(new_tuplestore, original_insert_tuple);
 			else if (map != NULL)
@@ -5464,6 +5428,7 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
 		new_shared.ats_tgoid = trigger->tgoid;
 		new_shared.ats_relid = RelationGetRelid(rel);
 		new_shared.ats_firing_id = 0;
+		new_shared.ats_transition_capture = transition_capture;
 
 		afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth],
 							 &new_event, &new_shared);
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d1edd38eccd..2e25048271f 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -406,6 +406,12 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		ExecARInsertTriggers(estate, resultRelInfo, tuple,
 							 recheckIndexes, NULL);
 
+		/*
+		 * XXX we should in theory pass a TransitionCaptureState object to the
+		 * above to capture transition tuples, but after statement triggers
+		 * don't actually get fired by replication yet anyway
+		 */
+
 		list_free(recheckIndexes);
 	}
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index f2534f20622..8d17425abea 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1442,14 +1442,18 @@ fireASTriggers(ModifyTableState *node)
 		case CMD_INSERT:
 			if (node->mt_onconflict == ONCONFLICT_UPDATE)
 				ExecASUpdateTriggers(node->ps.state,
-									 resultRelInfo);
-			ExecASInsertTriggers(node->ps.state, resultRelInfo);
+									 resultRelInfo,
+									 node->mt_transition_capture);
+			ExecASInsertTriggers(node->ps.state, resultRelInfo,
+								 node->mt_transition_capture);
 			break;
 		case CMD_UPDATE:
-			ExecASUpdateTriggers(node->ps.state, resultRelInfo);
+			ExecASUpdateTriggers(node->ps.state, resultRelInfo,
+								 node->mt_transition_capture);
 			break;
 		case CMD_DELETE:
-			ExecASDeleteTriggers(node->ps.state, resultRelInfo);
+			ExecASDeleteTriggers(node->ps.state, resultRelInfo,
+								 node->mt_transition_capture);
 			break;
 		default:
 			elog(ERROR, "unknown operation");
@@ -2304,6 +2308,10 @@ ExecEndModifyTable(ModifyTableState *node)
 {
 	int			i;
 
+	/* Free transition tables */
+	if (node->mt_transition_capture != NULL)
+		DestroyTransitionCaptureState(node->mt_transition_capture);
+
 	/*
 	 * Allow any FDWs to shut down
 	 */
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 51a25c8ddc2..06199953fe9 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -42,8 +42,8 @@ typedef struct TriggerData
 } TriggerData;
 
 /*
- * Meta-data to control the capture of old and new tuples into transition
- * tables from child tables.
+ * The state for capturing old and new tuples into transition tables for a
+ * single ModifyTable node.
  */
 typedef struct TransitionCaptureState
 {
@@ -72,6 +72,10 @@ typedef struct TransitionCaptureState
 	 * the original tuple directly.
 	 */
 	HeapTuple	tcs_original_insert_tuple;
+
+	/* The tuplestores backing the transition tables. */
+	Tuplestorestate *tcs_old_tuplestore;
+	Tuplestorestate *tcs_new_tuplestore;
 } TransitionCaptureState;
 
 /*
@@ -162,13 +166,15 @@ extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc);
 
 extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc);
 extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc);
+extern void DestroyTransitionCaptureState(TransitionCaptureState *tcs);
 
 extern void FreeTriggerDesc(TriggerDesc *trigdesc);
 
 extern void ExecBSInsertTriggers(EState *estate,
 					 ResultRelInfo *relinfo);
 extern void ExecASInsertTriggers(EState *estate,
-					 ResultRelInfo *relinfo);
+					 ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture);
 extern TupleTableSlot *ExecBRInsertTriggers(EState *estate,
 					 ResultRelInfo *relinfo,
 					 TupleTableSlot *slot);
@@ -183,7 +189,8 @@ extern TupleTableSlot *ExecIRInsertTriggers(EState *estate,
 extern void ExecBSDeleteTriggers(EState *estate,
 					 ResultRelInfo *relinfo);
 extern void ExecASDeleteTriggers(EState *estate,
-					 ResultRelInfo *relinfo);
+					 ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture);
 extern bool ExecBRDeleteTriggers(EState *estate,
 					 EPQState *epqstate,
 					 ResultRelInfo *relinfo,
@@ -200,7 +207,8 @@ extern bool ExecIRDeleteTriggers(EState *estate,
 extern void ExecBSUpdateTriggers(EState *estate,
 					 ResultRelInfo *relinfo);
 extern void ExecASUpdateTriggers(EState *estate,
-					 ResultRelInfo *relinfo);
+					 ResultRelInfo *relinfo,
+					 TransitionCaptureState *transition_capture);
 extern TupleTableSlot *ExecBRUpdateTriggers(EState *estate,
 					 EPQState *epqstate,
 					 ResultRelInfo *relinfo,
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 995410f1aae..0261d66e5c5 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2194,6 +2194,26 @@ DETAIL:  ROW triggers with transition tables are not supported in inheritance hi
 drop trigger child_row_trig on child;
 alter table child inherit parent;
 drop table child, parent;
+--
+-- Verify behavior of queries with wCTEs, where multiple transition
+-- tuplestores can be active at the same time because there are
+-- multiple DML statements that might fire triggers with transition
+-- tables
+--
+create table table1 (a int);
+create table table2 (a text);
+create trigger table1_trig
+  after insert on table1 referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger table2_trig
+  after insert on table2 referencing new table as new_table
+  for each statement execute procedure dump_insert();
+with wcte as (insert into table1 values (42))
+  insert into table2 values ('hello world');
+NOTICE:  trigger = table2_trig, new table = ("hello world")
+NOTICE:  trigger = table1_trig, new table = (42)
+drop table table1;
+drop table table2;
 -- cleanup
 drop function dump_insert();
 drop function dump_update();
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 683a5f1e5c4..128126a0f1a 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1704,6 +1704,27 @@ alter table child inherit parent;
 
 drop table child, parent;
 
+--
+-- Verify behavior of queries with wCTEs, where multiple transition
+-- tuplestores can be active at the same time because there are
+-- multiple DML statements that might fire triggers with transition
+-- tables
+--
+create table table1 (a int);
+create table table2 (a text);
+create trigger table1_trig
+  after insert on table1 referencing new table as new_table
+  for each statement execute procedure dump_insert();
+create trigger table2_trig
+  after insert on table2 referencing new table as new_table
+  for each statement execute procedure dump_insert();
+
+with wcte as (insert into table1 values (42))
+  insert into table2 values ('hello world');
+
+drop table table1;
+drop table table2;
+
 -- cleanup
 drop function dump_insert();
 drop function dump_update();