WIP: push AFTER-trigger execution into ModifyTable node
In http://archives.postgresql.org/message-id/26545.1255140067@sss.pgh.pa.us
I suggested that we should push the actual execution (not just queuing)
of non-deferred AFTER triggers into the new ModifyTable plan node.
The attached patch does that, and seems like a nice improvement since it
removes knowledge of trigger handling from a number of other places.
However the original objective was to allow EXPLAIN to associate trigger
runtimes with ModifyTable nodes, and I realized that this patch doesn't
accomplish that --- the trigger stats are still accumulated in the
executor-wide EState, not in the ModifyTable node. Right at the moment
we could cheat and have EXPLAIN print the trigger stats under
ModifyTable anyway, because there can be only one ModifyTable in any
plan tree. But that will fall down as soon as we try to let INSERT
RETURNING and friends execute within WITH clauses.
After poking around a bit I think it should be possible to keep the
trigger instrumentation data in the ModifyTable node instead of in
EState, and thereby allow EXPLAIN to know which node to blame the
trigger time on. This will require passing an extra parameter down
from nodeModifyTable into the trigger code, but none of those call paths
are very long. Still, it'll be a significantly more invasive patch by
the time it's done than what you see here.
So, before I go off and do that work: anybody have an objection to this
line of development? The main implication of changing to this approach
is that we'll be nailing down the assumption that each WITH (command
RETURNING) clause acts very much like a separate statement for
trigger purposes: it will fire BEFORE STATEMENT triggers at start,
and AFTER STATEMENT triggers at end, and actually execute all
non-deferred AFTER triggers, before we move on to executing the next
WITH clause or the main query.
regards, tom lane
Tom Lane wrote:
So, before I go off and do that work: anybody have an objection to this
line of development? The main implication of changing to this approach
is that we'll be nailing down the assumption that each WITH (command
RETURNING) clause acts very much like a separate statement for
trigger purposes: it will fire BEFORE STATEMENT triggers at start,
and AFTER STATEMENT triggers at end, and actually execute all
non-deferred AFTER triggers, before we move on to executing the next
WITH clause or the main query.
Like we've discussed before, WITH (.. RETURNING ..) is probably most
useful for moving rows from one table to another. When you're moving a
lot of rows around, there's some point where I believe this execution
strategy will be a lot slower than the traditional approach due to
storing the RETURNING results on disk. I've been thinking that in some
cases we could inline the CTE for this to actually be a quite
significant performance benefit, so I'm not too fancy about the approach
you're suggesting.
Regards,
Marko Tiikkaja
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
Like we've discussed before, WITH (.. RETURNING ..) is probably most
useful for moving rows from one table to another. When you're moving a
lot of rows around, there's some point where I believe this execution
strategy will be a lot slower than the traditional approach due to
storing the RETURNING results on disk. I've been thinking that in some
cases we could inline the CTE for this to actually be a quite
significant performance benefit, so I'm not too fancy about the approach
you're suggesting.
Well, this is what we need to nail down *now*. Are we going to say that
use of WITH(RETURNING) means you forfeit all guarantees about order of
trigger firing? Short of that, I don't believe that it is sane to think
about pipelining such things. And if we do do that, it sounds like a
security hole to me, because the owner of the trigger isn't the one who
agreed to forfeit predictability.
regards, tom lane
On Wed, Oct 28, 2009 at 8:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
Like we've discussed before, WITH (.. RETURNING ..) is probably most
useful for moving rows from one table to another. When you're moving a
lot of rows around, there's some point where I believe this execution
strategy will be a lot slower than the traditional approach due to
storing the RETURNING results on disk. I've been thinking that in some
cases we could inline the CTE for this to actually be a quite
significant performance benefit, so I'm not too fancy about the approach
you're suggesting.Well, this is what we need to nail down *now*. Are we going to say that
use of WITH(RETURNING) means you forfeit all guarantees about order of
trigger firing? Short of that, I don't believe that it is sane to think
about pipelining such things. And if we do do that, it sounds like a
security hole to me, because the owner of the trigger isn't the one who
agreed to forfeit predictability.
I don't see why either behavior would be a security hole; we get to
define how the system behaves, and users have to write their triggers
to cope with that behavior. We don't want to throw random roadbocks
in the way of sanity, but users are not entitled to assume that no
future major release of PG will have semantics that are in any way
different from whichever release they're now running, especially for
features that don't even exist in the current release. If you have a
specific concern here, maybe you could provide an example.
To be honest, I'm not entirely comfortable with either behavior.
Pipelining a delete out of one table into an insert into another table
seems VERY useful to me, and I'd like us to have a way to do that. On
the other hand, in more complex cases, the fact that the effects of a
statement are normally not visible to that statement could lead to
some fairly confusing behavior, especially when triggers are involved.
So I don't really know what the right thing is. What I really want
is to provide both behaviors, but I'm not sure there's any sensible
way to do that, and even if there were it's not clear to me that users
will know which one they want.
...Robert
Robert Haas <robertmhaas@gmail.com> writes:
To be honest, I'm not entirely comfortable with either behavior.
Pipelining a delete out of one table into an insert into another table
seems VERY useful to me, and I'd like us to have a way to do that. On
the other hand, in more complex cases, the fact that the effects of a
statement are normally not visible to that statement could lead to
some fairly confusing behavior, especially when triggers are involved.
Yup, that's right. The trigger timing is really the least of the issues.
If you consider that the various WITH clauses run concurrently with each
other and the main statement, rather than sequentially, then
(1) you have implementation-dependent, and not very desirable, behaviors
if any of the statements modify the same table;
(2) none of the statements will see each others' effects, which is
certainly going to confuse some people;
(3) I'm afraid that there's no good way to ensure that the modifying
statements run to completion if the top-level SELECT stops short of
reading that CTE to its end.
Pipelined execution would be nice but I really doubt that it's worth
what we'd have to give up to have it. The one-at-a-time behavior will
be simple to understand and reliable to use. Concurrent execution won't
be either.
regards, tom lane
On Thu, Oct 29, 2009 at 7:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pipelined execution would be nice but I really doubt that it's worth
what we'd have to give up to have it. The one-at-a-time behavior will
be simple to understand and reliable to use. Concurrent execution won't
be either.
I think the ideal way to get pipelined execution will be to detect the
cases where it's safe, ie, no deletes, inserts, or updates, no
recursive calls, and only one call site, and inline the sql directly
into the callsite. Actually we could do that even if there are two
callsites if there are no volatile functions but estimating whether
that would be a win or a loss would be hard.
--
greg
Greg Stark wrote:
On Thu, Oct 29, 2009 at 7:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pipelined execution would be nice but I really doubt that it's worth
what we'd have to give up to have it. The one-at-a-time behavior will
be simple to understand and reliable to use. Concurrent execution won't
be either.I think the ideal way to get pipelined execution will be to detect the
cases where it's safe, ie, no deletes, inserts, or updates, no
recursive calls, and only one call site, and inline the sql directly
into the callsite. Actually we could do that even if there are two
callsites if there are no volatile functions but estimating whether
that would be a win or a loss would be hard.
What I've had in mind is pipelining the execution only when it doesn't
have *any* impact on the outcome. This would mean only allowing it when
the top-level statement is either a SELECT or an INSERT. Also, UPDATEs
and DELETEs inside CTEs can't have the same result relations. Whether
or not we want to break the expected(?) behaviour for statement-level
triggers, I have no opinion to way or another.
Regards,
Marko Tiikkaja
On Sat, Oct 31, 2009 at 5:00 PM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
Greg Stark wrote:
On Thu, Oct 29, 2009 at 7:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pipelined execution would be nice but I really doubt that it's worth
what we'd have to give up to have it. The one-at-a-time behavior will
be simple to understand and reliable to use. Concurrent execution won't
be either.I think the ideal way to get pipelined execution will be to detect the
cases where it's safe, ie, no deletes, inserts, or updates, no
recursive calls, and only one call site, and inline the sql directly
into the callsite. Actually we could do that even if there are two
callsites if there are no volatile functions but estimating whether
that would be a win or a loss would be hard.What I've had in mind is pipelining the execution only when it doesn't
have *any* impact on the outcome. This would mean only allowing it when
the top-level statement is either a SELECT or an INSERT. Also, UPDATEs
and DELETEs inside CTEs can't have the same result relations. Whether
or not we want to break the expected(?) behaviour for statement-level
triggers, I have no opinion to way or another.
You'd also have to disallow the case when there are any triggers on
the INSERT, or where there are any triggers on anything else (because
they might access the target table of the INSERT). This will end up
being so restricted as to be useless.
...Robert
Robert Haas wrote:
You'd also have to disallow the case when there are any triggers on
the INSERT, or where there are any triggers on anything else (because
they might access the target table of the INSERT). This will end up
being so restricted as to be useless.
I might be wrong here, but I don't think it matters because those
trigger calls would have a different snapshot, right? Or am I missing
something?
Regards,
Marko Tiikkaja
Robert Haas <robertmhaas@gmail.com> writes:
On Sat, Oct 31, 2009 at 5:00 PM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:What I've had in mind is pipelining the execution only when it doesn't
have *any* impact on the outcome. �This would mean only allowing it when
the top-level statement is either a SELECT or an INSERT. �Also, UPDATEs
and DELETEs inside CTEs can't have the same result relations. �Whether
or not we want to break the expected(?) behaviour for statement-level
triggers, I have no opinion to way or another.
You'd also have to disallow the case when there are any triggers on
the INSERT, or where there are any triggers on anything else (because
they might access the target table of the INSERT). This will end up
being so restricted as to be useless.
Well, it's already the case that BEFORE triggers shouldn't count on
seeing any particular subset of a statement's results completed.
We could define AFTER triggers as all being fired after the entire
statement is complete (which is not the direction my patch was headed
in, but I have no allegiance to that). So I think we could define our
way out of the trigger timing issue, and I don't see any big objection
to limiting pipelining to cases where the sub-statements' target tables
don't overlap.
However, this still doesn't address the problem of what happens when the
top-level select fails to read all of the CTE output (because it has a
LIMIT, or the client doesn't read all the output of a portal, etc etc).
Partially executing an update in such cases is no good.
regards, tom lane
Tom Lane wrote:
However, this still doesn't address the problem of what happens when the
top-level select fails to read all of the CTE output (because it has a
LIMIT, or the client doesn't read all the output of a portal, etc etc).
Partially executing an update in such cases is no good.
I've previously thought about making the CTE aware of the LIMIT,
similarly to a top-N sort, but I don't think it's worth it. If we have
a LIMIT, we could just fall back to the statement-at-the-time execution.
I'm not sure what all cases you mean with "the client doesn't read all
the output of a portal", but at least for cursors we'd have to first
execute ModifyTable nodes.
Regards,
Marko Tiikkaja
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
I've previously thought about making the CTE aware of the LIMIT,
similarly to a top-N sort, but I don't think it's worth it.
That approach doesn't lead to a solution because then you could *never*
optimize it. The protocol-level limit option is always present.
It's conceivable that we could have ExecutorEnd forcibly run any DML
CTEs to the end (and fire their triggers?) before shutting down the
executor, but it seems like a kluge.
regards, tom lane
On Nov 1, 2009, at 10:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sat, Oct 31, 2009 at 5:00 PM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:What I've had in mind is pipelining the execution only when it
doesn't
have *any* impact on the outcome. This would mean only allowing
it when
the top-level statement is either a SELECT or an INSERT. Also,
UPDATEs
and DELETEs inside CTEs can't have the same result relations.
Whether
or not we want to break the expected(?) behaviour for statement-
level
triggers, I have no opinion to way or another.You'd also have to disallow the case when there are any triggers on
the INSERT, or where there are any triggers on anything else (because
they might access the target table of the INSERT). This will end up
being so restricted as to be useless.Well, it's already the case that BEFORE triggers shouldn't count on
seeing any particular subset of a statement's results completed.
We could define AFTER triggers as all being fired after the entire
statement is complete (which is not the direction my patch was headed
in, but I have no allegiance to that). So I think we could define our
way out of the trigger timing issue, and I don't see any big objection
to limiting pipelining to cases where the sub-statements' target
tables
don't overlap.
Hmm... yeah. If we do that, then pipelining becomes a much more
feasible optimization. I think that definition is a bit more likely
to result in POLA violations, but I'm not sure by how much.
However, this still doesn't address the problem of what happens when
the
top-level select fails to read all of the CTE output (because it has a
LIMIT, or the client doesn't read all the output of a portal, etc
etc).
Partially executing an update in such cases is no good.
Well, like you said elsewhere on this thread, you just have to finish
out any remaining bits after the main query completes.
...Robert