WIP fix proposal for bug #6123

Started by Kevin Grittnerover 14 years ago41 messageshackers
Jump to latest
#1Kevin Grittner
Kevin.Grittner@wicourts.gov

We're nearing completion of testing the migration of a lot of code
which used our custom Java framework into PostgreSQL functions and
triggers. Yesterday our testers ran into surprising behavior
related to delete triggers. A test case is presented on the -bugs
list, but basically it amounts to this:

(1) We have some detail which is summarized by triggers into
related higher-level tables for performance reasons.

(2) On delete, some of the higher level tables should cascade the
delete down to the lower levels.

(3) Sometimes the same tables are involved in both.

This is complicated by the foreign key situation -- due to
conversion of less-than-perfect data and the fact that there is a
legal concept of the elected Clerk of Court in each county being the
"custodian of the data" we have orphaned detail in some tables which
we don't have authority to delete or create bogus parent rows for.
(It would actually be a felony for us to do so, I think.) Until 9.2
we won't be able to create foreign keys for these relationships, but
in each county we've created foreign keys for all relationships
without orphans. So, this is one reason we can't count on foreign
key declarations, with the ON DELETE CASCADE option, yet we don't
want to drop the foreign keys where they exist, as they help prevent
further degradation of the data integrity. So the DELETE from the
children should occur in the BEFORE trigger to avoid upsetting FK
logic. Otherwise we could move the cascading deletes to the AFTER
DELETE trigger, where this odd behavior doesn't occur.

So basically, the goal of this patch is to ensure that a BEFORE
DELETE trigger doesn't silently fail to delete a row because that
row was updated during the BEFORE DELETE trigger firing (in our case
by secondary trigger firing).

If that description was too hard to follow, let me know and I'll try
again. :-/

[Summarizing discussion on the -bugs list,] Tom didn't feel that
there was a need to support application code which does what I
describe above, and he felt that fixing it would open a can of
worms, with logical quandaries about correct behavior. Basically,
the changes I made were within switch statements where if the row
was found to be HeapTupleUpdated in READ COMMITTED, it would follow
the ctid chain; I used similar logic for HeapTupleSelfUpdated
regardless of transaction isolation level. The reasons for not
re-firing delete triggers here is the same for why delete triggers
are not fired in the existing case -- it's just one delete.

No claims are made for completeness of this patch -- it's a "proof of
concept" on which I hope to get comments. Before this patch would be
production ready I would need to check for similar needs on UPDATE,
and would need to check to make sure there is no resource leakage.
It passes `make check-world`, `make installcheck-world` with a
database set for serializable transaction isolation, and the
isolation tests. And of course, it doesn't show the behavior which
we found so astonishing -- we no longer see an attempted delete of a
parent succeed in deleting the children but leave the parent sitting
there.

A patch against 9.0 based on this approach may be find its way into
production here in about two weeks if there are no
contra-indications, so any review would be very much appreciated.

-Kevin

Attachments:

delete-before-fix-wip1.patchtext/plain; name=delete-before-fix-wip1.patchDownload+25-1
#2Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#1)
Re: WIP fix proposal for bug #6123

On Wed, Jul 20, 2011 at 2:58 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

So basically, the goal of this patch is to ensure that a BEFORE
DELETE trigger doesn't silently fail to delete a row because that
row was updated during the BEFORE DELETE trigger firing (in our case
by secondary trigger firing).

I've run across this problem before while writing application code and
have found it surprising, unfortunate, and difficult to work around.
It's not so bad when it only bites you once, but as things get more
complicated and you have more and more triggers floating around, it
gets pretty darn horrible. One of the things I've done to mitigate
the impact of this is to write as many triggers as possible as AFTER
triggers, but that's sometimes difficult to accomplish.

Your scenario is a BEFORE DELETE trigger that does an UPDATE on the
same row, but I think this problem also occurs if you have a BEFORE
UPDATE trigger that does an UPDATE on the same row. I believe the
second update gets silently ignored.

I'm unfortunately unqualified to speak to the correctness of your
patch, but I concur with your view that the current behavior is lousy.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#2)
Re: WIP fix proposal for bug #6123

Robert Haas <robertmhaas@gmail.com> wrote:

I think this problem also occurs if you have a BEFORE
UPDATE trigger that does an UPDATE on the same row.

I'm pretty sure you're right, and I do intend to cover that in the
final patch.

-Kevin

#4Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#2)
Re: WIP fix proposal for bug #6123

Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 20, 2011 at 2:58 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

So basically, the goal of this patch is to ensure that a BEFORE
DELETE trigger doesn't silently fail to delete a row because that
row was updated during the BEFORE DELETE trigger firing (in our
case by secondary trigger firing).

I've run across this problem before while writing application code
and have found it surprising, unfortunate, and difficult to work
around. It's not so bad when it only bites you once, but as things
get more complicated and you have more and more triggers floating
around, it gets pretty darn horrible. One of the things I've done
to mitigate the impact of this is to write as many triggers as
possible as AFTER triggers

Yeah, this is not an issue in AFTER triggers, so moving any updating
to those is a solution. In most cases that's where you want
triggered modifications to take place anyway. The cascading delete
situation is the most obvious exception, although there certainly
could be others.

but that's sometimes difficult to accomplish.

Yeah, sometimes.

Your scenario is a BEFORE DELETE trigger that does an UPDATE on
the same row, but I think this problem also occurs if you have a
BEFORE UPDATE trigger that does an UPDATE on the same row. I
believe the second update gets silently ignored.

My testing shows that the primary update gets ignored, while all the
triggered effects of that update are persisted. Yuck. :-( It
certainly seems possible to turn that around, but that hardly seems
better. In asking application programmers here what they would
*expect* to happen, they all seem to think that it is surprising
that the BEFORE trigger functions *return a record*, rather than a
boolean to say whether to proceed with the operation. They feel it
would be less confusing if a value set into NEW was effective if the
operation does take effect, and the boolean controls whether or not
that happens. They rather expect that if an update happens from the
same transaction while a before trigger is running, that the NEW
record will reflect the change.

I recognize how hard it would be to create that expected behavior,
and how unlikely it is that the community would accept such a change
at this point. But current behavior is to silently do something
really dumb, so I think some change should be considered -- even if
that change is to throw an error where we now allow nonsense.

INSERT is not a problem -- if a BEFORE INSERT trigger inserts a row
with a conflicting primary key (or other unique index key), the
operation will be rolled back. That's fine.

I think DELETE can be cleanly fixed with a patch similar to what I
posted earlier in the thread. I found one more value that looks
like it should be set, and it could use some comments, but I believe
that we can get DELETE behavior which is every bit as sensible as
INSERT behavior with a very small change.

The worms do come crawling out of the can on BEFORE UPDATE triggers,
though. When faced with an UPDATE which hasn't yet been applied,
and other UPDATEs triggering from within the BEFORE UPDATE trigger
which touch the same row, it doesn't seem like you can honor both
the original UPDATE which was requested *and* the triggered UPDATEs.
Of course, if you discard the original UPDATE, you can't very well
do anything with the record returned from the BEFORE UPDATE trigger
for that update. Since it seems equally evil to allow the update
while ignoring some of the work caused by its trigger functions as
to show the work of the triggered updates while suppressing the
original update, I think the right thing is to throw an error if the
old row for a BEFORE UPDATE is updated by the same transaction and
the trigger function ultimately returns a non-NULL value.

Thoughts?

-Kevin

#5Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#4)
Re: WIP fix proposal for bug #6123

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

I believe that we can get DELETE behavior which is every bit as
sensible as INSERT behavior with a very small change.

I think the right thing is to throw an error if the old row for a
BEFORE UPDATE is updated by the same transaction and the trigger
function ultimately returns a non-NULL value.

And to make this a bit less hand-wavy, a rough patch attached. I
expect the error message could use some word-smithing, and it could
use comments; but it seemed like something concrete might speed
things along.

-Kevin

Attachments:

bug6123-v1.patchtext/plain; name=bug6123-v1.patchDownload+29-1
#6Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#4)
Re: WIP fix proposal for bug #6123

On Fri, Jul 22, 2011 at 5:01 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Your scenario is a BEFORE DELETE trigger that does an UPDATE on
the same row, but I think this problem also occurs if you have a
BEFORE UPDATE trigger that does an UPDATE on the same row.  I
believe the second update gets silently ignored.

My testing shows that the primary update gets ignored, while all the
triggered effects of that update are persisted.  Yuck.  :-(

That was my recollection...

It
certainly seems possible to turn that around, but that hardly seems
better.

Agreed.

In asking application programmers here what they would
*expect* to happen, they all seem to think that it is surprising
that the BEFORE trigger functions *return a record*, rather than a
boolean to say whether to proceed with the operation.  They feel it
would be less confusing if a value set into NEW was effective if the
operation does take effect, and the boolean controls whether or not
that happens.  They rather expect that if an update happens from the
same transaction while a before trigger is running, that the NEW
record will reflect the change.

I think this is mostly a matter of what you get familiar with, and, as
you say, not worth breaking compatibility for.

I recognize how hard it would be to create that expected behavior,
and how unlikely it is that the community would accept such a change
at this point.  But current behavior is to silently do something
really dumb, so I think some change should be considered -- even if
that change is to throw an error where we now allow nonsense.

INSERT is not a problem -- if a BEFORE INSERT trigger inserts a row
with a conflicting primary key (or other unique index key), the
operation will be rolled back.  That's fine.

I think DELETE can be cleanly fixed with a patch similar to what I
posted earlier in the thread.  I found one more value that looks
like it should be set, and it could use some comments, but I believe
that we can get DELETE behavior which is every bit as sensible as
INSERT behavior with a very small change.

The worms do come crawling out of the can on BEFORE UPDATE triggers,
though.  When faced with an UPDATE which hasn't yet been applied,
and other UPDATEs triggering from within the BEFORE UPDATE trigger
which touch the same row, it doesn't seem like you can honor both
the original UPDATE which was requested *and* the triggered UPDATEs.
Of course, if you discard the original UPDATE, you can't very well
do anything with the record returned from the BEFORE UPDATE trigger
for that update.  Since it seems equally evil to allow the update
while ignoring some of the work caused by its trigger functions as
to show the work of the triggered updates while suppressing the
original update, I think the right thing is to throw an error if the
old row for a BEFORE UPDATE is updated by the same transaction and
the trigger function ultimately returns a non-NULL value.

Thoughts?

Well, it seems to me that if the trigger update and the main update
were executed as separate commands (with no triggers involved) it
would often be the case that they'd dovetail nicely. When this has
come up for me, it's usually been the case that the sets of fields
being updated are completely non-overlapping. So ideally what I'd
like to happen is to have EPQ, or something like it, test whether the
newest version of the row still satisfies the UPDATE criteria. If so,
it applies the update to the new row version; if not, it either
discards the main UPDATE or throws an error. There's still some room
here for surprising results, but I think they would be surprising
results arising out of having done something intrinsically
complicated, rather than surprising results arising out of an odd
implementation artifact.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#6)
Re: WIP fix proposal for bug #6123

Robert Haas <robertmhaas@gmail.com> wrote:

Well, it seems to me that if the trigger update and the main
update were executed as separate commands (with no triggers
involved) it would often be the case that they'd dovetail nicely.
When this has come up for me, it's usually been the case that the
sets of fields being updated are completely non-overlapping.

Agreed that this is typically the case -- that's why the application
programmers here expected NEW to be effectively a dynamic
representation of the WIP state of the row. A lot of things would
"just work" that way. Of course, they're blissfully unaware of what
a huge revamp of the guts of PostgreSQL that would be.

So ideally what I'd like to happen is to have EPQ, or something
like it, test whether the newest version of the row still
satisfies the UPDATE criteria. If so, it applies the update to
the new row version; if not, it either discards the main UPDATE or
throws an error. There's still some room here for surprising
results, but I think they would be surprising results arising out
of having done something intrinsically complicated, rather than
surprising results arising out of an odd implementation artifact.

So, you're advocating a "logical merge" of the results with
something exceptional done on a conflicting update of the same
columns? That would effectively get you to the same end result as a
"live" NEW tuple, but without such a radical revamp of the guts of
things. Still, not trivial to do properly, and I would argue for
throwing an error rather than silently doing something surprising on
conflict.

This issue has already forced the rearrangement of our release
schedule here, so I'm going to do the simple fix of just throwing an
error on update from the BEFORE UPDATE trigger (of the row for with
the trigger is firing). That fix is very simple and seems very safe
to me, and should allow us to deploy without further schedule
slippage; then I'll see if I can code up what you're suggesting. I
had a new patch I was about to post with new error language, a
different SQLSTATE, comments, and regression test changes; but
unless someone wants to see that I won't clutter the list with it
until I've had a chance to see if I can manage to handle it the way
you're requesting.

There's no doubt that it would be better the way you're suggesting;
but it looks to me like about five times as many lines of code,
harder to be sure it's right, and probably forcing me to learn a few
new subsystems of PostgreSQL internals to accomplish.

Thanks for the feedback.

-Kevin

#8Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#7)
Re: WIP fix proposal for bug #6123

On Mon, Jul 25, 2011 at 12:26 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

There's no doubt that it would be better the way you're suggesting;
but it looks to me like about five times as many lines of code,
harder to be sure it's right, and probably forcing me to learn a few
new subsystems of PostgreSQL internals to accomplish.

Sorry, I didn't mean to make homework for you. Nor am I sure that the
solution will pass must all around even if I think it's the best thing
since sliced bread. I was just throwing it out there as what I would
like to have happen in an ideal world...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#8)
Re: WIP fix proposal for bug #6123

Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jul 25, 2011 at 12:26 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

There's no doubt that it would be better the way you're
suggesting; but it looks to me like about five times as many
lines of code, harder to be sure it's right, and probably forcing
me to learn a few new subsystems of PostgreSQL internals to
accomplish.

Sorry, I didn't mean to make homework for you. Nor am I sure that
the solution will pass must all around even if I think it's the
best thing since sliced bread. I was just throwing it out there
as what I would like to have happen in an ideal world...

Well, if it can be done, it will be better and less likely to break
existing code, so it's at least worth looking at. I don't object to
broadening my horizons. ;-) Sorry if it sounded like a complaint;
my intention was to communicate that I'm going to be looking at it,
but I've got a few more urgent tasks to deal with first to get our
application release out the door.

By the way, my current patch does break two existing UPDATE
statements in the regression test misc.sql file:

| -- This non-func update stuff needs to be examined
| -- more closely. - jolly (2/22/96)
| --
| UPDATE tmp
| SET stringu1 = reverse_name(onek.stringu1)
| FROM onek
| WHERE onek.stringu1 = 'JBAAAA' and
| onek.stringu1 = tmp.stringu1;
|
| UPDATE tmp
| SET stringu1 = reverse_name(onek2.stringu1)
| FROM onek2
| WHERE onek2.stringu1 = 'JCAAAA' and
| onek2.stringu1 = tmp.stringu1;

Perhaps it's time....

-Kevin

#10Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#9)
Re: WIP fix proposal for bug #6123

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

By the way, my current patch does break two existing UPDATE
statements in the regression test misc.sql file:

Fixed in the attached version of the patch.

I consider the trigger behavior addressed by this patch to be a bug,
and serious enough to merit inclusion of a fix in the 9.1 release,
even at this late stage. I don't think it should be back-patched,
because it changes behavior that there is some remote chance that
somebody somewhere understands and is intentionally using. While I
agree that Robert's suggested approach (or something functionally
equivalent) would be the ideal long-term solution, I think that it
would be too large of a change to consider for 9.1. I am suggesting
a minimal "defensive" change for 9.1, with possible development of a
fancier approach in a later release.

To recap: a trigger for UPDATE BEFORE EACH ROW or DELETE BEFORE EACH
ROW, which directly or indirectly causes update of the OLD row in
the trigger, can cause the triggering operation to be silently
ignored even though the trigger returns a non-NULL record, and even
though all triggered modifications are persisted. The only clue
that an incomplete and incongruous set of operations has been
performed is that the UPDATE or DELETE count from the statement is
reduced by the number of rows on which this occurred: if an UPDATE
would have affected 5 rows, but one of them just fired triggers *as
though* it had been updated but was actually left untouched by the
original UPDATE statement, you would get "UPDATE 4" rather than
"UPDATE 5".

This patch causes DELETE to behave as most people would expect, and
throws and ERROR on the conflicting UPDATE case.

So far, beyond my confirmation that it passes all PostgreSQL
regression tests and works as expected in a few ad hoc tests I've
done, there have been six full-time days of business analysts here
testing our applications with a version of 9.0.4 patched this way,
confirming that the application works as expected. They have a
standard set of testing scripts they use for every release, and
programmers have added tests specifically targeted at this area.
Plus the analysts are trying to exercise as many paths to data
modification as possible, to stress the triggers, including
interfaces with other agencies. They are scheduled to do a minimum
of 20 more full-time days of testing in the next two weeks, so if
someone wants to suggest changes or alternatives to this particular
patch, there would still be time to get over 100 hours of
professional testing against it -- if it comes through soon enough.

-Kevin

Attachments:

bug6123-v2.patchtext/plain; name=bug6123-v2.patchDownload+218-21
#11Florian Pflug
fgp@phlo.org
In reply to: Kevin Grittner (#10)
Re: WIP fix proposal for bug #6123

On Aug1, 2011, at 20:02 , Kevin Grittner wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
I consider the trigger behavior addressed by this patch to be a bug,
and serious enough to merit inclusion of a fix in the 9.1 release,
even at this late stage.

I'm sorry but I disagree, on multiple grounds.

First, I'm not sure this is even a bug. To me, it seems that postgres
currently resolves an inherently ambiguous situation in one possible
way, while you expect it to pick another. It might be that the behaviour
that you suggest is more in line with people's expectations (More on
that below), but that still doesn't make the existing behaviour wrong.

Secondly, even if one considers the current behaviour to be wrong,
that still doesn't prove that your proposed behaviour is any better
(More on that below too). There might very well be situations in which
the current behaviour is preferable over the behaviour you suggest,
and maybe these situations turn out to be much more common than anyone
anticipates right now. But if we apply the patch now, the chance of that
being discovered before 9.1 is released is virtually zero. And after
the release there's no going back, so we might end up changing the
behaviour to the worse.

And lastly, I believe that your ultimate goal of guaranteeing that
a row is actually deleted (updated) after a BEFORE DELETE (BEFORE UPDATE)
trigger has fired is actually insurmountable anyway, at least for
isolation level READ COMMITTED. We don't seem to lock rows before firing
BEFORE DELETE or BEFORE UPDATE triggers, so a row may very well be
updated by a concurrent transaction between the firing of these triggers
and the actual DELETE or UPDATE. In READ COMMITTED mode, we'll then
re-check if the original WHERE condition still applies (using the
EvalPlanQual machinery), and only go forward with the modification
if it does.

Having said all that, I don't doubt that the current behaviour is
source of pain for you, and I do believe that we ought to improve things -
but not by replacing one unsatisfactory behaviour with another. The
root of the issue seems to be that you're trying to do things in a
BEFORE trigger that really belong into an AFTER trigger. If you could
explain why using an AFTER trigger doesn't work for you, maybe we could
improve them to be suitable for your purposes (and please forgive me
if you already did that).

To recap: a trigger for UPDATE BEFORE EACH ROW or DELETE BEFORE EACH
ROW, which directly or indirectly causes update of the OLD row in
the trigger, can cause the triggering operation to be silently
ignored even though the trigger returns a non-NULL record, and even
though all triggered modifications are persisted. The only clue
that an incomplete and incongruous set of operations has been
performed is that the UPDATE or DELETE count from the statement is
reduced by the number of rows on which this occurred: if an UPDATE
would have affected 5 rows, but one of them just fired triggers *as
though* it had been updated but was actually left untouched by the
original UPDATE statement, you would get "UPDATE 4" rather than
"UPDATE 5".

This patch causes DELETE to behave as most people would expect, and
throws and ERROR on the conflicting UPDATE case.

I'm prepared to argue that whether or not people expect the behaviour
that patch implements depends entirely on how you phrase the question.
If you ask "do you expect a row to be actually deleted after BEFORE
DELETE triggers have run", you'll undoubtedly hear "yes". But if you
ask "do you expect a row to be deleted even though it didn't match
the DELETE's WHERE condition" I bet you'll hear a very firm "no". And
equally firm do I expect to be the "no" if you ask "do you expect the
row that is actually deleted and the contents of the variable OLD in
the delete trigger to differ".

But the behaviour indicated in the second and third question *is* what
happens with your patch applied. Here's an example

create table t (id int);
create or replace function on_t_delete() returns trigger as $$
begin
raise notice 'deleting row %', old.id;
update t set id = -id where id = old.id;
return old;
end;
$$ language plpgsql volatile;
create trigger t_delete
before delete on t
for each row execute procedure on_t_delete();

insert into t (id) values (1);
insert into t (id) values (2);

delete from t where id > 0;

Without your patch, the DELETE doesn't remove any rows, because
they're updated in on_t_delete(). With your patch applied, the
rows are removed - even though, due to the UPDATE in on_t_delete(),
they *don't* match the DELETE's WHERE condition (id > 0) at the time
they're deleted.

best regards,
Florian Pflug

#12Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Florian Pflug (#11)
Re: WIP fix proposal for bug #6123

Florian Pflug <fgp@phlo.org> wrote:

On Aug1, 2011, at 20:02 , Kevin Grittner wrote:

I consider the trigger behavior addressed by this patch to be a
bug, and serious enough to merit inclusion of a fix in the 9.1
release, even at this late stage.

I'm sorry but I disagree, on multiple grounds.

Thanks for offering your thoughts!

First, I'm not sure this is even a bug. To me, it seems that
postgres currently resolves an inherently ambiguous situation in
one possible way, while you expect it to pick another. It might be
that the behaviour that you suggest is more in line with people's
expectations (More on that below), but that still doesn't make the
existing behaviour wrong.

That point would be a hard sell for me. To silently turn a BEFORE
trigger into an INSTEAD OF trigger invites data corruption, in the
sense that business rules that you thought were unconditionally
enforced in triggers can be broken with no obvious hint that it
happened.

Secondly, even if one considers the current behaviour to be wrong,
that still doesn't prove that your proposed behaviour is any
better

Granted. I tried to phrase my post to not preclude other solutions.

There might very well be situations in which the current behaviour
is preferable over the behaviour you suggest, and maybe these
situations turn out to be much more common than anyone anticipates
right now.

My imagination is not up to the task of putting any plausible
situations forward where current behavior is a good thing. Of
course I realize that's not proof of anything, but it leaves me
skeptical.

But if we apply the patch now, the chance of that being discovered
before 9.1 is released is virtually zero. And after the release
there's no going back, so we might end up changing the behaviour
to the worse.

Perhaps throwing an error on the DELETE case as well as the UPDATE
case would address this concern. That would have saved weeks of
staff time here when we started seeing the mysterious behavior which
currently exists. The only way we finally got to the bottom of it
was to step through execution in gdb; those users not equipped to do
so might have bigger problems than we did.

We would probably still be patching the version we use in our shop
to actually *do* the delete if no error is thrown and no BEFORE
trigger returns NULL, but we would have been able to get there a lot
faster.

And lastly, I believe that your ultimate goal of guaranteeing that
a row is actually deleted (updated) after a BEFORE DELETE (BEFORE
UPDATE) trigger has fired is actually insurmountable anyway, at
least for isolation level READ COMMITTED.

Hmm. That's a very potent point. I had not considered READ
COMMITTED because we have never, ever used it under PostgreSQL.
Obviously, any solution has to work for those who *do* use it, too;
and the behavior between different isolation levels can't be too
convoluted. So, as you say, the *ultimate* goal may be
unattainable.

Having said all that, I don't doubt that the current behaviour is
source of pain for you,

It has thrown off our production development and deployment schedule
by weeks, and management has considered shelving the whole concept
of using PostgreSQL triggers because of it. I've convinced them
that this is a surmountable obstacle, and am trying to get things
back on track.

and I do believe that we ought to improve things - but not by
replacing one unsatisfactory behaviour with another.

I hope you're not suggesting *that's* my goal! ;-)

The root of the issue seems to be that you're trying to do things
in a BEFORE trigger that really belong into an AFTER trigger.

I'll take that as a *collective* pronoun, since I haven't personally
written any of the thousands of triggers. There are 20 programmers
doing that. Part of the problem is that they sometimes put things
in a BEFORE trigger that belong in an AFTER trigger. I caught one
place they were doing an UPDATE of the target row in a BEFORE
trigger rather than setting the values in the NEW record. My patch
makes the latter throw an error, as I believe it should, rather than
silently leaving the data in a bad state.

If you could explain why using an AFTER trigger doesn't work for
you, maybe we could improve them to be suitable for your purposes
(and please forgive me if you already did that).

I did to some extent, but really perhaps the biggest issue (which I
don't think I've really covered earlier in the thread) is to not
silently corrupt the data. I would settle for throwing an error on
the DELETE case as well as the UPDATE case, because my data would
then be protected, and programmers would be forced to work around
the issue in a way that PostgreSQL can handle correctly.

Robert said that he has mainly run into this on BEFORE UPDATE
triggers, so perhaps he can elaborate on the usage patterns that run
into it there.

In my case it is mainly an issue in deletes (and the special case of
a "purge" process where deletes are not normally allowed) doing a
depth-first search for dependent records, and deleting from the
bottom up. In some cases there is redundant information in higher
level tables based on primary data in lower level tables -- in the
form of counts, sums, or status codes. If you delete a case, and
that "cascades" to a event history table which has a trigger which
updates case status according to certain business rules, the delete
of the case is canceled because of the delete of a status-changing
event. That's very bad. If we at least threw an error instead, we
could identify the problem reliably, and code around it somehow.
That might be by moving the delete of dependent records to a point
after the parent record has already been deleted, but that runs the
risk of triggering other validation failures based on business rules
in the triggers, because the triggers would then be seeing a state
we try very hard to prevent.

To recap: a trigger for UPDATE BEFORE EACH ROW or DELETE BEFORE
EACH ROW, which directly or indirectly causes update of the OLD
row in the trigger, can cause the triggering operation to be
silently ignored even though the trigger returns a non-NULL
record, and even though all triggered modifications are
persisted. The only clue that an incomplete and incongruous set
of operations has been performed is that the UPDATE or DELETE
count from the statement is reduced by the number of rows on
which this occurred: if an UPDATE would have affected 5 rows, but
one of them just fired triggers *as though* it had been updated
but was actually left untouched by the original UPDATE statement,
you would get "UPDATE 4" rather than "UPDATE 5".

This patch causes DELETE to behave as most people would expect,
and throws and ERROR on the conflicting UPDATE case.

I'm prepared to argue that whether or not people expect the
behaviour that patch implements depends entirely on how you phrase
the question. If you ask "do you expect a row to be actually
deleted after BEFORE DELETE triggers have run", you'll undoubtedly
hear "yes". But if you ask "do you expect a row to be deleted even
though it didn't match the DELETE's WHERE condition" I bet you'll
hear a very firm "no".

OK. I guess I've neglected that possibility based on my transaction
isolation bias and the fact that the types of changes being made in
the cascade of deletes wouldn't tend to create such a situation.
But stepping back, I see your point there.

And equally firm do I expect to be the "no" if you ask "do you
expect the row that is actually deleted and the contents of the
variable OLD in the delete trigger to differ".

I'm less certain about that, especially if you phrase it in terms of
denormalization optimizations -- the redundant data in the form of
counts, sums, status codes, or sometimes just copied data from child
records stored in parents to support better performance on common
queries. That's the main issue for me,

But the behaviour indicated in the second and third question *is*
what happens with your patch applied. Here's an example

create table t (id int);
create or replace function on_t_delete() returns trigger as $$
begin
raise notice 'deleting row %', old.id;
update t set id = -id where id = old.id;
return old;
end;
$$ language plpgsql volatile;
create trigger t_delete
before delete on t
for each row execute procedure on_t_delete();

insert into t (id) values (1);
insert into t (id) values (2);

delete from t where id > 0;

Without your patch, the DELETE doesn't remove any rows, because
they're updated in on_t_delete(). With your patch applied, the
rows are removed - even though, due to the UPDATE in
on_t_delete(), they *don't* match the DELETE's WHERE condition (id

0) at the time they're deleted.

That's not the sort of situation we're seeing here, but I get your
point.

Would you feel comfortable with a patch which threw an error on the
DELETE case, as it does on the UPDATE case?

-Kevin

#13Florian Pflug
fgp@phlo.org
In reply to: Kevin Grittner (#12)
Re: WIP fix proposal for bug #6123

On Aug2, 2011, at 17:03 , Kevin Grittner wrote:

Florian Pflug <fgp@phlo.org> wrote:

First, I'm not sure this is even a bug. To me, it seems that
postgres currently resolves an inherently ambiguous situation in
one possible way, while you expect it to pick another. It might be
that the behaviour that you suggest is more in line with people's
expectations (More on that below), but that still doesn't make the
existing behaviour wrong.

That point would be a hard sell for me. To silently turn a BEFORE
trigger into an INSTEAD OF trigger invites data corruption, in the
sense that business rules that you thought were unconditionally
enforced in triggers can be broken with no obvious hint that it
happened.

Hm, I can see your point. But I still maintain that a trigger who
acts based on values of OLD which don't reflect the actual tuple
being deleted is an equally dangerous source of data corruption.

There might very well be situations in which the current behaviour
is preferable over the behaviour you suggest, and maybe these
situations turn out to be much more common than anyone anticipates
right now.

My imagination is not up to the task of putting any plausible
situations forward where current behavior is a good thing. Of
course I realize that's not proof of anything, but it leaves me
skeptical.

One situation I had in mind was a table whose records contain a
kind of reference count. A cleanup process might then issue a DELETE
which removes all entries whose reference count is zero. Now, if such
a table contains a BEFORE DELETE trigger which, through some elaborate
chains of triggers, manages to increment some row's reference count
that was initially zero, it'd be a bad idea for the DELETE to remove it.

But my actual point was not so much that *I* have a reasonable use-case
in mind in which the current behaviour is preferable, but more that
*someone* might.

But if we apply the patch now, the chance of that being discovered
before 9.1 is released is virtually zero. And after the release
there's no going back, so we might end up changing the behaviour
to the worse.

Perhaps throwing an error on the DELETE case as well as the UPDATE
case would address this concern.

It addresses all concerns except this one, I'm afraid. Whatever we
change the behaviour to, I feel that we need to give ourselves plenty
of time to tweak it, should problems arise. Which just isn't possible
before 9.1 is released. Or at least that my opinion.

That would have saved weeks of
staff time here when we started seeing the mysterious behavior which
currently exists. The only way we finally got to the bottom of it
was to step through execution in gdb; those users not equipped to do
so might have bigger problems than we did.

Ouch.

And lastly, I believe that your ultimate goal of guaranteeing that
a row is actually deleted (updated) after a BEFORE DELETE (BEFORE
UPDATE) trigger has fired is actually insurmountable anyway, at
least for isolation level READ COMMITTED.

Hmm. That's a very potent point. I had not considered READ
COMMITTED because we have never, ever used it under PostgreSQL.
Obviously, any solution has to work for those who *do* use it, too;
and the behavior between different isolation levels can't be too
convoluted. So, as you say, the *ultimate* goal may be
unattainable.

After reading the code again, I take that back. We *do* seem to lock
the tuple before firing BEFORE {UPDATE,DELETE} triggers - trigger.c's
GetTupleForTrigger() takes care of that. I previous missed that because
I only looked at ExecUpdate() and ExecDelete() in nodeModifyTable.c.
Sorry for that.

and I do believe that we ought to improve things - but not by
replacing one unsatisfactory behaviour with another.

I hope you're not suggesting *that's* my goal! ;-)

Nah, more of a case of bad wording on my part...

The root of the issue seems to be that you're trying to do things
in a BEFORE trigger that really belong into an AFTER trigger.

I'll take that as a *collective* pronoun, since I haven't personally
written any of the thousands of triggers. There are 20 programmers
doing that. Part of the problem is that they sometimes put things
in a BEFORE trigger that belong in an AFTER trigger. I caught one
place they were doing an UPDATE of the target row in a BEFORE
trigger rather than setting the values in the NEW record. My patch
makes the latter throw an error, as I believe it should, rather than
silently leaving the data in a bad state.

If you could explain why using an AFTER trigger doesn't work for
you, maybe we could improve them to be suitable for your purposes
(and please forgive me if you already did that).

I did to some extent, but really perhaps the biggest issue (which I
don't think I've really covered earlier in the thread) is to not
silently corrupt the data. I would settle for throwing an error on
the DELETE case as well as the UPDATE case, because my data would
then be protected, and programmers would be forced to work around
the issue in a way that PostgreSQL can handle correctly.

Hm, OK I see your point now I believe. This is not so much about
wanting to put things into BEFORe triggers which doesn't really
fit there, but instead avoiding weeks of debugging if someones
messes up.

In my case it is mainly an issue in deletes (and the special case of
a "purge" process where deletes are not normally allowed) doing a
depth-first search for dependent records, and deleting from the
bottom up. In some cases there is redundant information in higher
level tables based on primary data in lower level tables -- in the
form of counts, sums, or status codes. If you delete a case, and
that "cascades" to a event history table which has a trigger which
updates case status according to certain business rules, the delete
of the case is canceled because of the delete of a status-changing
event. That's very bad. If we at least threw an error instead, we
could identify the problem reliably, and code around it somehow.
That might be by moving the delete of dependent records to a point
after the parent record has already been deleted, but that runs the
risk of triggering other validation failures based on business rules
in the triggers, because the triggers would then be seeing a state
we try very hard to prevent.

If we go with my suggestion below (which entails throwing the error
earlier at the time of the offending update or delete), you could
get the non-throwing behaviour you want by protecting UPDATES and
DELETES which might touch rows which are already in the process of being
updated or deleted by EXCEPTION blocks.

And equally firm do I expect to be the "no" if you ask "do you
expect the row that is actually deleted and the contents of the
variable OLD in the delete trigger to differ".

I'm less certain about that, especially if you phrase it in terms of
denormalization optimizations -- the redundant data in the form of
counts, sums, status codes, or sometimes just copied data from child
records stored in parents to support better performance on common
queries. That's the main issue for me,

Yeah, but the problem is we cannot guarantee that all the "important"
fields will match ;-)

Would you feel comfortable with a patch which threw an error on the
DELETE case, as it does on the UPDATE case?

Yes, though with a little additional twist. The twist being that I'd
like the error to be thrown earlier, at the time of the offending
UPDATE or DELETE, not later, when the original UPDATE or DELETE which
caused the BEFORE trigger invocation stumbles over the updated row.
It not only seems more logical that way, but it also makes it possible
for the trigger to catch the error, and react accordingly.

For example, if a BEFORE trigger on table A modifies table B, and
one of B's BEFORE triggers in turn modifies A, B's BEFORE trigger
could then catch the error and e.g. decide to skip the UPDATE.

Implementation-wise, I image we'd set a flag in the tuple header
after locking the tuple (as I now discovered we do, sorry again)
but before firing BEFORE triggers. We'd clear the flag again
once all BEFORE triggers have run, and let the actual UPDATE
or DELETE proceed. If an UPDATE or DELETE encounters a flagged
tuple, and the transaction (or one of its parents) is the current
lock holder, we'd throw an error. To clean up after aborted
transactions, we'd clear the flag upon acquiring a tuple lock.

Instead of a flag in the tuple header, we could also keep
a (backend-local) list of ctids for which we've fired BEFORE
triggers but haven't done the actual modification yet.

We might also want to make it possible to distinguish pending
UPDATES from pending DELETES (i.e., choose a different error code for
the two cases), because it seems reasonable that code would want
to react differently in those cases (e.g., skip an UPDATE if there's
a pending DELETE).

best regards,
Florian Pflug

#14Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Florian Pflug (#13)
Re: WIP fix proposal for bug #6123

Florian Pflug <fgp@phlo.org> wrote:

On Aug2, 2011, at 17:03 , Kevin Grittner wrote:

Hm, OK I see your point now I believe. This is not so much about
wanting to put things into BEFORe triggers which doesn't really
fit there, but instead avoiding weeks of debugging if someones
messes up.

I would say that is the overriding concern. But there are arguments
for sometimes putting some things in the BEFORE triggers which could
raise the issue, too. For us, there would be a lot to dance around
if deletes cascade in the AFTER triggers, since we'd have orphans
hanging around during some validations, as described below.

In my case it is mainly an issue in deletes (and the special case
of a "purge" process where deletes are not normally allowed)
doing a depth-first search for dependent records, and deleting
from the bottom up. In some cases there is redundant information
in higher level tables based on primary data in lower level
tables -- in the form of counts, sums, or status codes. If you
delete a case, and that "cascades" to a event history table which
has a trigger which updates case status according to certain
business rules, the delete of the case is canceled because of the
delete of a status-changing event. That's very bad. If we at
least threw an error instead, we could identify the problem
reliably, and code around it somehow. That might be by moving
the delete of dependent records to a point after the parent
record has already been deleted, but that runs the risk of
triggering other validation failures based on business rules in
the triggers, because the triggers would then be seeing a state
we try very hard to prevent.

If we go with my suggestion below (which entails throwing the
error earlier at the time of the offending update or delete), you
could get the non-throwing behaviour you want by protecting
UPDATES and DELETES which might touch rows which are already in
the process of being updated or deleted by EXCEPTION blocks.

Would you feel comfortable with a patch which threw an error on
the DELETE case, as it does on the UPDATE case?

Yes, though with a little additional twist. The twist being that
I'd like the error to be thrown earlier, at the time of the
offending UPDATE or DELETE, not later, when the original UPDATE or
DELETE which caused the BEFORE trigger invocation stumbles over
the updated row. It not only seems more logical that way, but it
also makes it possible for the trigger to catch the error, and
react accordingly.

I hadn't thought of that. It does seem better in every way except
for how much work it takes to do it and how much testing it needs to
have confidence in it. Definitely not 9.1 material.

That's OK -- we have something which should work for us in 9.0 and
9.1. I'd really prefer not to "fork" this permanently, but if
there's a better way coming in 9.2, we can use our patch for a year
or so and then switch over to the superior capabilities available in
9.2.

For example, if a BEFORE trigger on table A modifies table B, and
one of B's BEFORE triggers in turn modifies A, B's BEFORE trigger
could then catch the error and e.g. decide to skip the UPDATE.

Yeah, that provides a reasonable default but still gives the
application programmer fine-grained control on this. That's a good
thing.

Implementation-wise, I image we'd set a flag in the tuple header
after locking the tuple (as I now discovered we do, sorry again)
but before firing BEFORE triggers. We'd clear the flag again
once all BEFORE triggers have run, and let the actual UPDATE
or DELETE proceed. If an UPDATE or DELETE encounters a flagged
tuple, and the transaction (or one of its parents) is the current
lock holder, we'd throw an error. To clean up after aborted
transactions, we'd clear the flag upon acquiring a tuple lock.

Instead of a flag in the tuple header, we could also keep
a (backend-local) list of ctids for which we've fired BEFORE
triggers but haven't done the actual modification yet.

We might also want to make it possible to distinguish pending
UPDATES from pending DELETES (i.e., choose a different error code
for the two cases), because it seems reasonable that code would
want to react differently in those cases (e.g., skip an UPDATE if
there's a pending DELETE).

Does anyone else want to weigh in on this overall approach or narrow
down the alternatives Florian has sketched out above? If we can
reach some consensus on an approach, I can work on a new patch to
implement it.

-Kevin

#15Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#14)
Re: WIP fix proposal for bug #6123

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

Florian Pflug <fgp@phlo.org> wrote:

Hm, OK I see your point now I believe. This is not so much about
wanting to put things into BEFORe triggers which doesn't really
fit there, but instead avoiding weeks of debugging if someones
messes up.

I would say that is the overriding concern.

I'm going to have to argue with myself -- really, the *biggest*
concern is that the current situation allows data to be quietly
mangled through non-obvious action-at-a-distance. The damage might
not be discovered for a very long time. Debugging the cause of the
mangling, once noticed, is the *next* most significant concern.

-Kevin

#16Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#14)
Re: WIP fix proposal for bug #6123

On Tue, Aug 2, 2011 at 2:32 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Would you feel comfortable with a patch which threw an error on
the DELETE case, as it does on the UPDATE case?

Yes, though with a little additional twist. The twist being that
I'd like the error to be thrown earlier, at the time of the
offending UPDATE or DELETE, not later, when the original UPDATE or
DELETE which caused the BEFORE trigger invocation stumbles over
the updated row. It not only seems more logical that way, but it
also makes it possible for the trigger to catch the error, and
react accordingly.

I hadn't thought of that.  It does seem better in every way except
for how much work it takes to do it and how much testing it needs to
have confidence in it.  Definitely not 9.1 material.

IMHO, none of this is 9.1 material. It's been like this forever, and
it sucks, but it doesn't suck particularly more than any of the other
things that we didn't get around to fixing in 9.1. In fact, I'd go so
far as to argue that this would be just about the worst imaginable
type of patch to slip into a release at the last minute. On the one
hand, I think there's a real chance of breaking things in subtle ways
that are difficult to detect; and on the other hand, a significant
percentage of users will get no benefit from any change we make here,
just because this is something that most people don't do. People who
would otherwise be tempted to write their applications this way will
end up not doing so precisely because it currently does not work. So
I think we should focus on getting the right semantics here, rather
than trying to minimize the scope of the change.

I'm not sure I understand the justification for throwing an error.
Updating a row twice is not an error under any other circumstances;
nor is deleting a row you've updated. Why should it be here?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Florian Pflug
fgp@phlo.org
In reply to: Robert Haas (#16)
Re: WIP fix proposal for bug #6123

On Aug3, 2011, at 04:54 , Robert Haas wrote:

On Tue, Aug 2, 2011 at 2:32 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Would you feel comfortable with a patch which threw an error on
the DELETE case, as it does on the UPDATE case?

Yes, though with a little additional twist. The twist being that
I'd like the error to be thrown earlier, at the time of the
offending UPDATE or DELETE, not later, when the original UPDATE or
DELETE which caused the BEFORE trigger invocation stumbles over
the updated row. It not only seems more logical that way, but it
also makes it possible for the trigger to catch the error, and
react accordingly.

I hadn't thought of that. It does seem better in every way except
for how much work it takes to do it and how much testing it needs to
have confidence in it. Definitely not 9.1 material.

I'm not sure I understand the justification for throwing an error.
Updating a row twice is not an error under any other circumstances;
nor is deleting a row you've updated. Why should it be here?

Because updating or deleting a row from within a BEFORE trigger fired
upon updating or deleting that very row causes two intermingled
UPDATE/DELETE executions, not one UPDATE/DELETE followed by another.

Here's a step-by-step description of what happens in the case of two
UPDATES, assuming that we don't ignore any updated and throw no error.

1) UPDATE T SET ... WHERE ID=1
2) Row with ID=1 is found & locked
3) BEFORE UPDATE triggers are fired, with OLD containing the original
row's contents and NEW the values specified in (1)
4) UPDATE T SET ... WHERE ID=1, executed from within one of the triggers
5) BEFORE UPDATE triggers are fired again, with OLD containing the
original row's contents and NEW the value specified in (4).
6) The row is updated with the values specified in (4), possibly changed
by the triggers in (5)
7) The row is updated with the values specified in (2), possibly changed
by the triggers in (3).

Now, in step (7), we're overwriting the UPDATE from steps 4-6 without
even looking at the row that UPDATE produced. If, for example, both UPDATES
do "counter = counter + 1", we'd end up with the counter incremented by
1, *not* by 2, even though there were two distinct UPDATEs. Or even worse,
the inner UPDATE at (4) might have modified the ID. Then, we'll apply the
outer UPDATE in (7), even though the original WHERE condition from (1)
no longer matches the row.

We could attempt to fix that by using the EvalPlanQual machinery to
re-check the WHERE clause and re-compute the new row in (7). However,
EvalPlanQual introduces a lot of conceptional complexity, and can lead
to very surprising results for more complex UPDATES. (There's that whole
self-join problem with EvalPlanQual, for example)

Also, even if we did use EvalPlanQual, we'd still have executed the outer
BEFORE trigger (step 3) with values for OLD and NEW which *don't* match
the row the we actually update in (7). Doing that seems rather hazardous
too - who knows what the BEFORE trigger has used the values for.

The different between Kevin's original patch and my suggestion is, BTW,
that he made step (7) through an error while I suggested the error to
be thrown already at (4).

best regards,
Florian Pflug

#18Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#16)
Re: WIP fix proposal for bug #6123

Robert Haas <robertmhaas@gmail.com> wrote:

Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

Would you feel comfortable with a patch which threw an error on
the DELETE case, as it does on the UPDATE case?

Yes, though with a little additional twist. The twist being that
I'd like the error to be thrown earlier, at the time of the
offending UPDATE or DELETE, not later, when the original UPDATE
or DELETE which caused the BEFORE trigger invocation stumbles
over the updated row. It not only seems more logical that way,
but it also makes it possible for the trigger to catch the
error, and react accordingly.

I hadn't thought of that. It does seem better in every way
except for how much work it takes to do it and how much testing
it needs to have confidence in it. Definitely not 9.1 material.

IMHO, none of this is 9.1 material.

Nobody is currently arguing for including anything to fix this in
9.1. Perhaps I should have been more explicit that in agreeing with
Florian, I'm giving up on the idea of any PostgreSQL release
attempting to do anything about this before 9.2.

That said, this is a data corrupting bug in the view of management
here, based on reports from testers and my explanation of why they
saw that behavior. They are aware that it can be worked around, but
that's not acceptable if there's no reliable way to find all
occurrences of patterns that hit this bug. To keep this effort
alive here, I am using my quick hack for 9.0 and 9.1. Deletes will
behave as they expect, and updates won't quietly discard part of the
work of a transaction -- an error will be thrown in that situation.

It's been like this forever

As have many other data mangling bugs which we fix in minor
releases. Our point here is that it's never been like this in any
product that the Wisconsin Courts has used for triggers, and never
will be.

People who would otherwise be tempted to write their applications
this way will end up not doing so precisely because it currently
does not work.

I can attest to that.

So I think we should focus on getting the right semantics here,
rather than trying to minimize the scope of the change.

Agreed. I thought that's what we were talking about.

I'm not sure I understand the justification for throwing an error.

Nobody has suggested sane semantics for doing otherwise. There is
the pipe dream of what application programmers here would like,
described earlier in the thread. I just don't see that happening,
and I'm not sure it addresses all of Florian's concerns anyway. You
had a suggestion for how to salvage the update situation, but it was
pretty hand-wavy, and I find it very hard to reconcile to some of
the issues raised by Florian. Maybe someone can propose sane
semantics which covers all the objections, and maybe that would even
be possible to implement; but right now Florian's approach seems
like the most solid proposal yet put forward.

Updating a row twice is not an error under any other
circumstances; nor is deleting a row you've updated. Why should
it be here?

As Florian pointed out, it's not exactly a matter of doing those
things. If you have a proposal which addresses the issues raised
earlier in the thread, please share.

-Kevin

#19Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#18)
Re: WIP fix proposal for bug #6123

On Wed, Aug 3, 2011 at 10:48 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

As have many other data mangling bugs which we fix in minor
releases.  Our point here is that it's never been like this in any
product that the Wisconsin Courts has used for triggers, and never
will be.

I don't believe this is very similar at all to the sorts of bugs we
fix in minor releases, but let's leave that aside for the moment since
it seems that we're in violent agreement about which release we're
targeting. With respect to the second part of your comment, it might
be useful to describe how this works in some of those other products.

I'm not sure I understand the justification for throwing an error.

Nobody has suggested sane semantics for doing otherwise.  There is
the pipe dream of what application programmers here would like,
described earlier in the thread.  I just don't see that happening,
and I'm not sure it addresses all of Florian's concerns anyway.  You
had a suggestion for how to salvage the update situation, but it was
pretty hand-wavy, and I find it very hard to reconcile to some of
the issues raised by Florian.  Maybe someone can propose sane
semantics which covers all the objections, and maybe that would even
be possible to implement; but right now Florian's approach seems
like the most solid proposal yet put forward.

It probably is, but I'm not sure we've done enough thinking about it
to be certain that it's the right way forward.

On that note, I think in some ways the problems we're hitting here are
very much like serialization anomalies. If the user updates a tuple
based on its PK and sets some non-PK field to a constant, and while
we're doing that, a BEFORE trigger updates any field in the tuple
other than the PK, then in theory it seems we ought to be able to
reconcile the updates. It feels like the result ought to be the same
as if we'd simply run the BEFORE-trigger update to completion, and
then run the main update. However, if the BEFORE-trigger modifies any
columns that the main update examined while constructing its value for
NEW, then the updates can't be serialized. There's no way to get the
same result as if you'd done either one of them first, because they
are inextricably intertwined.

In practice, my hand-wavy reference to "reconciling the updates" is a
problem because of the way the trigger interface works. It feels
pretty impossible to decide that we're going to do the update, but
with some other random values we dredged up from some other update
replacing some of the ones the user explicitly handed to us. But if
the trigger could return an indication of which column values it
wished to override, then it seems to me that we could piece together a
reasonable set of semantics. It's not exactly clear how to make that
work, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Robert Haas
robertmhaas@gmail.com
In reply to: Florian Pflug (#17)
Re: WIP fix proposal for bug #6123

On Wed, Aug 3, 2011 at 4:50 AM, Florian Pflug <fgp@phlo.org> wrote:

On Aug3, 2011, at 04:54 , Robert Haas wrote:

On Tue, Aug 2, 2011 at 2:32 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Would you feel comfortable with a patch which threw an error on
the DELETE case, as it does on the UPDATE case?

Yes, though with a little additional twist. The twist being that
I'd like the error to be thrown earlier, at the time of the
offending UPDATE or DELETE, not later, when the original UPDATE or
DELETE which caused the BEFORE trigger invocation stumbles over
the updated row. It not only seems more logical that way, but it
also makes it possible for the trigger to catch the error, and
react accordingly.

I hadn't thought of that.  It does seem better in every way except
for how much work it takes to do it and how much testing it needs to
have confidence in it.  Definitely not 9.1 material.

I'm not sure I understand the justification for throwing an error.
Updating a row twice is not an error under any other circumstances;
nor is deleting a row you've updated.  Why should it be here?

Because updating or deleting a row from within a BEFORE trigger fired
upon updating or deleting that very row causes two intermingled
UPDATE/DELETE executions, not one UPDATE/DELETE followed by another.

Here's a step-by-step description of what happens in the case of two
UPDATES, assuming that we don't ignore any updated and throw no error.

1) UPDATE T SET ... WHERE ID=1
2) Row with ID=1 is found & locked
3) BEFORE UPDATE triggers are fired, with OLD containing the original
  row's contents and NEW the values specified in (1)
4) UPDATE T SET ... WHERE ID=1, executed from within one of the triggers
5) BEFORE UPDATE triggers are fired again, with OLD containing the
  original row's contents and NEW the value specified in (4).
6) The row is updated with the values specified in (4), possibly changed
  by the triggers in (5)
7) The row is updated with the values specified in (2), possibly changed
  by the triggers in (3).

Now, in step (7), we're overwriting the UPDATE from steps 4-6 without
even looking at the row that UPDATE produced. If, for example, both UPDATES
do "counter = counter + 1", we'd end up with the counter incremented by
1, *not* by 2, even though there were two distinct UPDATEs. Or even worse,
the inner UPDATE at (4) might have modified the ID. Then, we'll apply the
outer UPDATE in (7), even though the original WHERE condition from (1)
no longer matches the row.

We could attempt to fix that by using the EvalPlanQual machinery to
re-check the WHERE clause and re-compute the new row in (7). However,
EvalPlanQual introduces a lot of conceptional complexity, and can lead
to very surprising results for more complex UPDATES. (There's that whole
self-join problem with EvalPlanQual, for example)

Also, even if we did use EvalPlanQual, we'd still have executed the outer
BEFORE trigger (step 3) with values for OLD and NEW which *don't* match
the row the we actually update in (7). Doing that seems rather hazardous
too - who knows what the BEFORE trigger has used the values for.

The different between Kevin's original patch and my suggestion is, BTW,
that he made step (7) through an error while I suggested the error to
be thrown already at (4).

I think Kevin's proposal is better, because AFAICS if the BEFORE
UPDATE trigger returns NULL then we haven't got a problem; and we
can't know that until we get to step 7.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Florian Pflug
fgp@phlo.org
In reply to: Robert Haas (#19)
#22Florian Pflug
fgp@phlo.org
In reply to: Robert Haas (#20)
#23Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#20)
#24Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Florian Pflug (#22)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Florian Pflug (#21)
#26Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#25)
#27Florian Pflug
fgp@phlo.org
In reply to: Kevin Grittner (#26)
#28Florian Pflug
fgp@phlo.org
In reply to: Florian Pflug (#27)
#29Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Florian Pflug (#27)
#30Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Florian Pflug (#28)
#31Florian Pflug
fgp@phlo.org
In reply to: Kevin Grittner (#30)
#32Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Florian Pflug (#31)
#33Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#4)
#34Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#33)
#35Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#34)
#36Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#34)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#36)
#38Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#37)
#39Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Kevin Grittner (#38)
#40Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Alvaro Herrera (#39)
#41Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#40)