Concurrency bug in UPDATE of partition-key
(Mail subject changed; original thread : [1]/messages/by-id/CAJ3gD9d=wcbF-G00bYo4v5iWC69zm1UTSNxyRLpOgO9j4AtLWQ@mail.gmail.com)
On 8 March 2018 at 11:57, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On 8 March 2018 at 09:15, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
CREATE TABLE pa_target (key integer, val text)
PARTITION BY LIST (key);
CREATE TABLE part1 PARTITION OF pa_target FOR VALUES IN (1);
CREATE TABLE part2 PARTITION OF pa_target FOR VALUES IN (2);
INSERT INTO pa_target VALUES (1, 'initial1');session1:
BEGIN;
UPDATE pa_target SET val = val || ' updated by update1' WHERE key = 1;
UPDATE 1
postgres=# SELECT * FROM pa_target ;
key | val
-----+-----------------------------
1 | initial1 updated by update1
(1 row)session2:
UPDATE pa_target SET val = val || ' updated by update2', key = key + 1 WHERE
key = 1
<blocks>session1:
postgres=# COMMIT;
COMMIT<session1 unblocks and completes its UPDATE>
postgres=# SELECT * FROM pa_target ;
key | val
-----+-----------------------------
2 | initial1 updated by update2
(1 row)Ouch. The committed updates by session1 are overwritten by session2. This
clearly violates the rules that rest of the system obeys and is not
acceptable IMHO.Clearly, ExecUpdate() while moving rows between partitions is missing out on
re-constructing the to-be-updated tuple, based on the latest tuple in the
update chain. Instead, it's simply deleting the latest tuple and inserting a
new tuple in the new partition based on the old tuple. That's simply wrong.You are right. This need to be fixed. This is a different issue than
the particular one that is being worked upon in this thread, and both
these issues have different fixes.
As discussed above, there is a concurrency bug where during UPDATE of
a partition-key, another transaction T2 updates the same row.
Like you said, the tuple needs to be reconstructed when ExecDelete()
finds that the row has been updated by another transaction. We should
send back this information from ExecDelete() (I think tupleid
parameter gets updated in this case), and then in ExecUpdate() we
should goto lreplace, so that the the row is fetched back similar to
how it happens when heap_update() knows that the tuple was updated.
The attached WIP patch is how the fix is turning out to be.
ExecDelete() passes back the epqslot returned by EvalPlanQual().
ExecUpdate() uses this re-fetched tuple to re-process the row, just
like it does when heap_update() returns HeapTupleUpdated.
I need to write an isolation test for this fix.
[1]: /messages/by-id/CAJ3gD9d=wcbF-G00bYo4v5iWC69zm1UTSNxyRLpOgO9j4AtLWQ@mail.gmail.com
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
fix_concurrency_bug_WIP.patchapplication/octet-stream; name=fix_concurrency_bug_WIP.patchDownload+44-9
On Thu, Mar 8, 2018 at 4:55 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
(Mail subject changed; original thread : [1])
On 8 March 2018 at 11:57, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Clearly, ExecUpdate() while moving rows between partitions is missing out on
re-constructing the to-be-updated tuple, based on the latest tuple in the
update chain. Instead, it's simply deleting the latest tuple and inserting a
new tuple in the new partition based on the old tuple. That's simply wrong.You are right. This need to be fixed. This is a different issue than
the particular one that is being worked upon in this thread, and both
these issues have different fixes.As discussed above, there is a concurrency bug where during UPDATE of
a partition-key, another transaction T2 updates the same row.
We have one more behavior related to concurrency that would be
impacted due to row movement. Basically, now Select .. for Key Share
will block the Update that routes the tuple to a different partition
even if the update doesn't update the key (primary/unique key) column.
Example,
create table foo (a int2, b text) partition by list (a);
create table foo1 partition of foo for values IN (1);
create table foo2 partition of foo for values IN (2);
insert into foo values(1, 'Initial record');
Session-1
---------------
begin;
select * from foo where a=1 for key share;
Session-2
---------------
begin;
update foo set a=1, b= b|| '-> update 1' where a=1;
update foo set a=2, b= b|| '-> update 2' where a=1; --this will block
You can see when the update moves the row to a different partition, it
will block as internally it performs delete. I think as we have
already documented that such an update is internally Delete+Insert,
this behavior is expected, but I think it is better if we update the
docs (Row-level locks section) as well.
In particular, I am talking about below text in Row-level locks section [1]https://www.postgresql.org/docs/devel/static/explicit-locking.html#LOCKING-ROWS.
FOR KEY SHARE
Behaves similarly to FOR SHARE, except that the lock is weaker: SELECT
FOR UPDATE is blocked, but not SELECT FOR NO KEY UPDATE. A key-shared
lock blocks other transactions from performing DELETE or any UPDATE
that changes the key values, but not other UPDATE,
[1]: https://www.postgresql.org/docs/devel/static/explicit-locking.html#LOCKING-ROWS
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 8 March 2018 at 16:55, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
The attached WIP patch is how the fix is turning out to be.
ExecDelete() passes back the epqslot returned by EvalPlanQual().
ExecUpdate() uses this re-fetched tuple to re-process the row, just
like it does when heap_update() returns HeapTupleUpdated.I need to write an isolation test for this fix.
Attached patch now has the isolation test included. The only
difference with the earlier WIP patch is about some cosmetic changes,
and some more comments.
[1] : /messages/by-id/CAJ3gD9d=wcbF-G00bYo4v5iWC69zm1UTSNxyRLpOgO9j4AtLWQ@mail.gmail.com
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
fix_concurrency_bug.patchapplication/octet-stream; name=fix_concurrency_bug.patchDownload+90-8
Attached is a rebased patch version. Also included it in the upcoming
commitfest :
https://commitfest.postgresql.org/18/1660/
In the rebased version, the new test cases are added in the existing
isolation/specs/partition-key-update-1.spec test.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
fix_concurrency_bug_rebased.patchapplication/octet-stream; name=fix_concurrency_bug_rebased.patchDownload+73-10
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:
Attached is a rebased patch version. Also included it in the upcoming
commitfest :
https://commitfest.postgresql.org/18/1660/
Doesn't this belong to PostgreSQL 11 Open Items [1]https://wiki.postgresql.org/wiki/PostgreSQL_11_Open_Items or are you proposing it
as a feature enhancement for next version?
In the rebased version, the new test cases are added in the existing
isolation/specs/partition-key-update-1.spec test.
if (!tuple_deleted)
- return NULL;
+ {
+ /*
+ * epqslot will be typically NULL. But when ExecDelete() finds
+ * that another transaction has concurrently updated the same
+ * row, it re-fetches the row, skips the delete, and epqslot is
+ * set to the re-fetched tuple slot. In that case, we need to
+ * do all the checks again.
+ */
+ if (TupIsNull(epqslot))
+ return NULL;
+ else
+ {
+ slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot);
+ tuple = ExecMaterializeSlot(slot);
+ goto lreplace;
+ }
+ }
I think this will allow before row delete triggers to be fired more than
once. Normally, if the EvalPlanQual testing generates a new tuple, we
don't fire the triggers again. Now, one possibility could be that we don't
skip the delete after a concurrent update and still allow inserts to use a
new tuple generated by EvalPlanQual testing of delete. However, I think we
need to perform rechecks for update to check if the modified tuple still
requires to be moved to new partition, right or do you have some other
reason in mind?
[1]: https://wiki.postgresql.org/wiki/PostgreSQL_11_Open_Items
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 7 June 2018 at 11:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:Attached is a rebased patch version. Also included it in the upcoming
commitfest :
https://commitfest.postgresql.org/18/1660/Doesn't this belong to PostgreSQL 11 Open Items [1] or are you proposing it
as a feature enhancement for next version?
Yes, it needs to be in the Open items. I will have it added in that list.
In the rebased version, the new test cases are added in the existing
isolation/specs/partition-key-update-1.spec test.if (!tuple_deleted) - return NULL; + { + /* + * epqslot will be typically NULL. But when ExecDelete() finds + * that another transaction has concurrently updated the same + * row, it re-fetches the row, skips the delete, and epqslot is + * set to the re-fetched tuple slot. In that case, we need to + * do all the checks again. + */ + if (TupIsNull(epqslot)) + return NULL; + else + { + slot = ExecFilterJunk(resultRelInfo->ri_junkFilter, epqslot); + tuple = ExecMaterializeSlot(slot); + goto lreplace; + } + }I think this will allow before row delete triggers to be fired more than
once. Normally, if the EvalPlanQual testing generates a new tuple, we don't
fire the triggers again.
If there are BR delete triggers, the tuple will be locked using
GetTupleForTrigger(). So the subsequent EvalPlanQual testing won't be
run, since the tuple is already locked due to triggers having run.
But that leads me to think : The same concurrency issue can occur in
GetTupleForTrigger() also. Say, a concurrent session has already
locked the tuple, and GetTupleForTrigger() would wait and then return
the updated tuple in its last parameter newSlot. In that case, we need
to pass this slot back through ExecBRDeleteTriggers(), and further
through epqslot parameter of ExecDelete(). But yes, in this case, we
should avoid calling this trigger function the second time.
If you agree on the above, I will send an updated patch.
Now, one possibility could be that we don't skip
the delete after a concurrent update and still allow inserts to use a new
tuple generated by EvalPlanQual testing of delete. However, I think we need
to perform rechecks for update to check if the modified tuple still requires
to be moved to new partition, right or do you have some other reason in
mind?
Yes, that's the reason we need to start again from lreplace; i.e., for
checking not just the partition constraints but all the other checks
that were required earlier, because the tuple has been modified. It
may even end up moving to a different partition than the one chosen
earlier.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:
Attached is a rebased patch version. Also included it in the upcoming
commitfest :
https://commitfest.postgresql.org/18/1660/In the rebased version, the new test cases are added in the existing
isolation/specs/partition-key-update-1.spec test.
/*
+ * If this is part of an UPDATE of partition-key, the
+ * epq tuple will contain the changes from this
+ * transaction over and above the updates done by the
+ * other transaction. The caller should now use this
+ * tuple as its NEW tuple, rather than the earlier NEW
+ * tuple.
+ */
+ if (epqslot)
+ {
+ *epqslot = my_epqslot;
+ return NULL;
+ }
I think we need simmilar fix if there are BR Delete trigger and the
ExecDelete is blocked on heap_lock_tuple because the concurrent transaction
is updating the same row. Because in such case it would have already got
the final tuple so the hep_delete will return MayBeUpdated.
Below test can reproduce the issue.
CREATE TABLE pa_target (key integer, val text) PARTITION BY LIST (key);
CREATE TABLE part1 PARTITION OF pa_target FOR VALUES IN (1);
CREATE TABLE part2 PARTITION OF pa_target FOR VALUES IN (2);
CREATE TABLE deleted_row (count int);
CREATE OR REPLACE FUNCTION br_delete() RETURNS trigger AS
$$BEGIN
insert into deleted_row values(OLD.key);
RETURN OLD;
END;$$ LANGUAGE plpgsql;
CREATE TRIGGER test_br_trig BEFORE DELETE ON part1 FOR EACH ROW EXECUTE
PROCEDURE br_delete();
INSERT INTO pa_target VALUES (1, 'initial1');
session1:
postgres=# BEGIN;
BEGIN
postgres=# UPDATE pa_target SET val = val || ' updated by update1' WHERE
key = 1;
UPDATE 1
session2:
postgres=# UPDATE pa_target SET val = val || ' updated by update2', key =
key + 1 WHERE key =1;
<block>
session1:
postgres=# commit;
COMMIT
UPDATE 1
postgres=# select * from pa_target ;
key | val
-----+-----------------------------
2 | initial1 updated by update2 --> session1's update is overwritten.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Fri, Jun 8, 2018 at 2:23 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:Attached is a rebased patch version. Also included it in the upcoming
commitfest :
https://commitfest.postgresql.org/18/1660/In the rebased version, the new test cases are added in the existing
isolation/specs/partition-key-update-1.spec test./* + * If this is part of an UPDATE of partition-key, the + * epq tuple will contain the changes from this + * transaction over and above the updates done by the + * other transaction. The caller should now use this + * tuple as its NEW tuple, rather than the earlier NEW + * tuple. + */ + if (epqslot) + { + *epqslot = my_epqslot; + return NULL; + }I think we need simmilar fix if there are BR Delete trigger and the
ExecDelete is blocked on heap_lock_tuple because the concurrent transaction
is updating the same row. Because in such case it would have already got
the final tuple so the hep_delete will return MayBeUpdated.
Amit Kapila had pointed (offlist) that you are already trying to fix this
issue. I have one more question, Suppose the first time we come for
ExecDelete and fire the BR delete trigger and then realize that we need to
retry because of the concurrent update. Now, during retry, we realize that
because of the latest value we don't need to route the tuple to the
different partition so now we will call ExecUpdate and may fire the
BRUpdate triggers as well?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On 8 June 2018 at 16:44, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Jun 8, 2018 at 2:23 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:Attached is a rebased patch version. Also included it in the upcoming
commitfest :
https://commitfest.postgresql.org/18/1660/In the rebased version, the new test cases are added in the existing
isolation/specs/partition-key-update-1.spec test./* + * If this is part of an UPDATE of partition-key, the + * epq tuple will contain the changes from this + * transaction over and above the updates done by the + * other transaction. The caller should now use this + * tuple as its NEW tuple, rather than the earlier NEW + * tuple. + */ + if (epqslot) + { + *epqslot = my_epqslot; + return NULL; + }I think we need simmilar fix if there are BR Delete trigger and the
ExecDelete is blocked on heap_lock_tuple because the concurrent transaction
is updating the same row. Because in such case it would have already got
the final tuple so the hep_delete will return MayBeUpdated.Amit Kapila had pointed (offlist) that you are already trying to fix this
issue.
Yes, his comment about triggers made me realize the trigger issue
which you mentioned, about which I also explained in earlier reply.
I have one more question, Suppose the first time we come for
ExecDelete and fire the BR delete trigger and then realize that we need to
retry because of the concurrent update. Now, during retry, we realize that
because of the latest value we don't need to route the tuple to the
different partition so now we will call ExecUpdate and may fire the BRUpdate
triggers as well?
No, we don't fire the BR update trigger again. We go to lreplace
label, and BR trigger code is above that label. The reason we don't
need to run the trigger check again is because if we had already fired
it before, the row must have been locked, so concurrency scenario
won't arise in the first place.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Mon, Jun 11, 2018 at 10:19 AM, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:
On 8 June 2018 at 16:44, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Jun 8, 2018 at 2:23 PM, Dilip Kumar <dilipbalaut@gmail.com>
wrote:
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:Attached is a rebased patch version. Also included it in the upcoming
commitfest :
https://commitfest.postgresql.org/18/1660/In the rebased version, the new test cases are added in the existing
isolation/specs/partition-key-update-1.spec test./* + * If this is part of an UPDATE of partition-key, the + * epq tuple will contain the changes from this + * transaction over and above the updates done by the + * other transaction. The caller should now use this + * tuple as its NEW tuple, rather than the earlier NEW + * tuple. + */ + if (epqslot) + { + *epqslot = my_epqslot; + return NULL; + }I think we need simmilar fix if there are BR Delete trigger and the
ExecDelete is blocked on heap_lock_tuple because the concurrenttransaction
is updating the same row. Because in such case it would have already
got
the final tuple so the hep_delete will return MayBeUpdated.
Amit Kapila had pointed (offlist) that you are already trying to fix this
issue.Yes, his comment about triggers made me realize the trigger issue
which you mentioned, about which I also explained in earlier reply.I have one more question, Suppose the first time we come for
ExecDelete and fire the BR delete trigger and then realize that we needto
retry because of the concurrent update. Now, during retry, we realize
that
because of the latest value we don't need to route the tuple to the
different partition so now we will call ExecUpdate and may fire theBRUpdate
triggers as well?
No, we don't fire the BR update trigger again. We go to lreplace
label, and BR trigger code is above that label. The reason we don't
need to run the trigger check again is because if we had already fired
it before, the row must have been locked, so concurrency scenario
won't arise in the first place.
Ok, that make sense.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 7, 2018 at 1:53 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On 7 June 2018 at 11:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:I think this will allow before row delete triggers to be fired more than
once. Normally, if the EvalPlanQual testing generates a new tuple, we don't
fire the triggers again.If there are BR delete triggers, the tuple will be locked using
GetTupleForTrigger(). So the subsequent EvalPlanQual testing won't be
run, since the tuple is already locked due to triggers having run.But that leads me to think : The same concurrency issue can occur in
GetTupleForTrigger() also. Say, a concurrent session has already
locked the tuple, and GetTupleForTrigger() would wait and then return
the updated tuple in its last parameter newSlot. In that case, we need
to pass this slot back through ExecBRDeleteTriggers(), and further
through epqslot parameter of ExecDelete(). But yes, in this case, we
should avoid calling this trigger function the second time.If you agree on the above, I will send an updated patch.
Sounds reasonable to me.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Jun 11, 2018 at 3:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 7, 2018 at 1:53 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On 7 June 2018 at 11:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:I think this will allow before row delete triggers to be fired more than
once. Normally, if the EvalPlanQual testing generates a new tuple, we don't
fire the triggers again.If there are BR delete triggers, the tuple will be locked using
GetTupleForTrigger(). So the subsequent EvalPlanQual testing won't be
run, since the tuple is already locked due to triggers having run.But that leads me to think : The same concurrency issue can occur in
GetTupleForTrigger() also. Say, a concurrent session has already
locked the tuple, and GetTupleForTrigger() would wait and then return
the updated tuple in its last parameter newSlot. In that case, we need
to pass this slot back through ExecBRDeleteTriggers(), and further
through epqslot parameter of ExecDelete(). But yes, in this case, we
should avoid calling this trigger function the second time.If you agree on the above, I will send an updated patch.
Sounds reasonable to me.
Try to add a test case which covers BR trigger code path where you are
planning to update.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 11 June 2018 at 15:29, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jun 11, 2018 at 3:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 7, 2018 at 1:53 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On 7 June 2018 at 11:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jun 5, 2018 at 8:03 PM, Amit Khandekar <amitdkhan.pg@gmail.com>
wrote:I think this will allow before row delete triggers to be fired more than
once. Normally, if the EvalPlanQual testing generates a new tuple, we don't
fire the triggers again.If there are BR delete triggers, the tuple will be locked using
GetTupleForTrigger(). So the subsequent EvalPlanQual testing won't be
run, since the tuple is already locked due to triggers having run.But that leads me to think : The same concurrency issue can occur in
GetTupleForTrigger() also. Say, a concurrent session has already
locked the tuple, and GetTupleForTrigger() would wait and then return
the updated tuple in its last parameter newSlot. In that case, we need
to pass this slot back through ExecBRDeleteTriggers(), and further
through epqslot parameter of ExecDelete(). But yes, in this case, we
should avoid calling this trigger function the second time.If you agree on the above, I will send an updated patch.
Sounds reasonable to me.
Attached is v2 version of the patch. It contains the above
trigger-related issue fixed.
The updated tuple is passed back using the existing newslot parameter
of GetTupleForTrigger(). When ExecBRDeleteTriggers() is called using a
new epqslot parameter, it means caller wants to skip the trigger
execution, because the updated tuple needs to be again checked for
constraints. I have added comments of this behaviour in the
ExecBRDeleteTriggers() function header.
Because the trigger execution is skipped the first time ExecDelete()
is called, I didn't have to explicitly add any changes for preventing
trigger execution the second time.
Try to add a test case which covers BR trigger code path where you are
planning to update.
Added the test cases. Also added cases where the concurrent session
causes the EvalPlanQual() test to fail, so the partition key update
does not happen.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
fix_concurrency_bug_v2.patchapplication/octet-stream; name=fix_concurrency_bug_v2.patchDownload+205-13
On Mon, Jun 18, 2018 at 10:21 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Attached is v2 version of the patch. It contains the above
trigger-related issue fixed.The updated tuple is passed back using the existing newslot parameter
of GetTupleForTrigger(). When ExecBRDeleteTriggers() is called using a
new epqslot parameter, it means caller wants to skip the trigger
execution, because the updated tuple needs to be again checked for
constraints. I have added comments of this behaviour in the
ExecBRDeleteTriggers() function header.
Thanks for the updated patch. I have verified the BR trigger
behaviour, its working fine with the patch.
+ CREATE FUNCTION func_footrg() RETURNS TRIGGER AS $$
+ BEGIN
+ RETURN OLD;
+ END $$ LANGUAGE PLPGSQL;
+ CREATE TRIGGER footrg_ondel BEFORE DELETE ON footrg1
+ FOR EACH ROW EXECUTE PROCEDURE func_footrg();
Should we also create a test case where we can verify that some
unnecessary or duplicate triggers are not executed? For example, in
the above trigger function, we can maintain a count in some table (e.g
how many time delete trigger got executed) and after test over we can
verify the same.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Jun 18, 2018 at 11:28 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Jun 18, 2018 at 10:21 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Attached is v2 version of the patch. It contains the above
trigger-related issue fixed.The updated tuple is passed back using the existing newslot parameter
of GetTupleForTrigger(). When ExecBRDeleteTriggers() is called using a
new epqslot parameter, it means caller wants to skip the trigger
execution, because the updated tuple needs to be again checked for
constraints. I have added comments of this behaviour in the
ExecBRDeleteTriggers() function header.Thanks for the updated patch. I have verified the BR trigger
behaviour, its working fine with the patch.+ CREATE FUNCTION func_footrg() RETURNS TRIGGER AS $$ + BEGIN + RETURN OLD; + END $$ LANGUAGE PLPGSQL; + CREATE TRIGGER footrg_ondel BEFORE DELETE ON footrg1 + FOR EACH ROW EXECUTE PROCEDURE func_footrg();Should we also create a test case where we can verify that some
unnecessary or duplicate triggers are not executed?
I am not sure how much value we will add by having such a test. In
general, it is good to have tests that cover various aspects of
functionality, but OTOH, we have to be careful to not overdo it.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 18 June 2018 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jun 18, 2018 at 11:28 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Should we also create a test case where we can verify that some
unnecessary or duplicate triggers are not executed?I am not sure how much value we will add by having such a test. In
general, it is good to have tests that cover various aspects of
functionality, but OTOH, we have to be careful to not overdo it.
Actually I am thinking, it's not a big deal adding a RAISE statement
in trigger function in the existing testcases. It will clearly show how
many times the trigger has executed. So I will go ahead and do that.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
On Mon, Jun 18, 2018 at 6:19 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On 18 June 2018 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jun 18, 2018 at 11:28 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Should we also create a test case where we can verify that some
unnecessary or duplicate triggers are not executed?I am not sure how much value we will add by having such a test. In
general, it is good to have tests that cover various aspects of
functionality, but OTOH, we have to be careful to not overdo it.Actually I am thinking, it's not a big deal adding a RAISE statement
in trigger function in the existing testcases. It will clearly show how
many times the trigger has executed. So I will go ahead and do that.
Ok, That makes sense to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On 19 June 2018 at 13:06, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Jun 18, 2018 at 6:19 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On 18 June 2018 at 17:56, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jun 18, 2018 at 11:28 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Should we also create a test case where we can verify that some
unnecessary or duplicate triggers are not executed?I am not sure how much value we will add by having such a test. In
general, it is good to have tests that cover various aspects of
functionality, but OTOH, we have to be careful to not overdo it.Actually I am thinking, it's not a big deal adding a RAISE statement
in trigger function in the existing testcases. It will clearly show how
many times the trigger has executed. So I will go ahead and do that.Ok, That makes sense to me.
Could not add RAISE statement, because the isolation test does not
seem to print those statements in the output. Instead, I have dumped
some rows in a new table through the trigger function, and did a
select from that table. Attached is v3 patch.
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
Attachments:
fix_concurrency_bug_v3.patchapplication/octet-stream; name=fix_concurrency_bug_v3.patchDownload+222-13
On Tue, Jun 19, 2018 at 9:33 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Could not add RAISE statement, because the isolation test does not
seem to print those statements in the output. Instead, I have dumped
some rows in a new table through the trigger function, and did a
select from that table. Attached is v3 patch.
Comments
-----------------
1.
+ /*
+ * If the tuple was concurrently updated and the caller of this
+ * function requested for the updated tuple, skip the trigger
+ * execution.
+ */
+ if (newSlot != NULL && epqslot != NULL)
+ return false;
There is a possibility of memory leak due to above change. Basically,
GetTupleForTrigger returns a newly allocated tuple. We should free
the triggertuple by calling heap_freetuple(trigtuple).
2.
ExecBRDeleteTriggers(..)
{
..
+ /* If requested, pass back the concurrently updated tuple, if any. */
+ if (epqslot != NULL)
+ *epqslot = newSlot;
+
if (trigtuple == NULL)
return false;
+
+ /*
+ * If the tuple was concurrently updated and the caller of this
+ * function requested for the updated tuple, skip the trigger
+ * execution.
+ */
+ if (newSlot != NULL && epqslot != NULL)
+ return false;
..
}
Can't we merge the first change into second, something like:
if (newSlot != NULL && epqslot != NULL)
{
*epqslot = newSlot;
return false;
}
3.
ExecBRDeleteTriggers(..)
{
- TupleTableSlot *newSlot;
int i;
Assert(HeapTupleIsValid(fdw_trigtuple) ^ ItemPointerIsValid(tupleid));
if (fdw_trigtuple == NULL)
{
+ TupleTableSlot *newSlot;
+
..
}
This change is okay on its own, but I think it has nothing to do with
this patch. If you don't mind, can we remove it?
4.
+/* ----------
+ * ExecBRDeleteTriggers()
+ *
+ * Called to execute BEFORE ROW DELETE triggers.
+ *
+ * Returns false in following scenarios :
+ * 1. Trigger function returned NULL.
+ * 2. The tuple was concurrently deleted OR it was concurrently updated and the
+ * new tuple didn't pass EvalPlanQual() test.
+ * 3. The tuple was concurrently updated and the new tuple passed the
+ * EvalPlanQual() test, BUT epqslot parameter is not NULL. In such case, this
+ * function skips the trigger function execution, because the caller has
+ * indicated that it wants to further process the updated tuple. The updated
+ * tuple slot is passed back through epqslot.
+ *
+ * In all other cases, returns true.
+ * ----------
+ */
bool
ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,
The above comments appear unrelated to this function, example, this
function is not at all aware of concurrent updates. I think if you
want to add comments to this function, we can add them as a separate
patch or if you want to add them as part of this patch at least make
them succinct.
5.
+ /*
+ * If this is part of an UPDATE of partition-key, the
+ * epq tuple will contain the changes from this
+ * transaction over and above the updates done by the
+ * other transaction. The caller should now use this
+ * tuple as its NEW tuple, rather than the earlier NEW
+ * tuple.
+ */
I think we can simplify this comment as "If requested, pass back the
updated tuple if any.", something similar to what you have at some
other place in the patch.
6.
+# Session A is moving a row into another partition, but is waiting for
+# another session B that is updating the original row. The row that ends up
+# in the new partition should contain the changes made by session B.
You have named sessions as s1 and s2, but description uses A and B, it
will be better to use s1 and s2 respectively.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 20 June 2018 at 18:54, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jun 19, 2018 at 9:33 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Could not add RAISE statement, because the isolation test does not
seem to print those statements in the output. Instead, I have dumped
some rows in a new table through the trigger function, and did a
select from that table. Attached is v3 patch.Comments ----------------- 1. + /* + * If the tuple was concurrently updated and the caller of this + * function requested for the updated tuple, skip the trigger + * execution. + */ + if (newSlot != NULL && epqslot != NULL) + return false;There is a possibility of memory leak due to above change. Basically,
GetTupleForTrigger returns a newly allocated tuple. We should free
the triggertuple by calling heap_freetuple(trigtuple).
Right, done.
2. ExecBRDeleteTriggers(..) { .. + /* If requested, pass back the concurrently updated tuple, if any. */ + if (epqslot != NULL) + *epqslot = newSlot; + if (trigtuple == NULL) return false; + + /* + * If the tuple was concurrently updated and the caller of this + * function requested for the updated tuple, skip the trigger + * execution. + */ + if (newSlot != NULL && epqslot != NULL) + return false; .. }Can't we merge the first change into second, something like:
if (newSlot != NULL && epqslot != NULL)
{
*epqslot = newSlot;
return false;
}
We want to update *epqslot with whatever the value of newSlot is. So
if newSlot is NULL, epqslot should be NULL. So we can't do that in the
"if (newSlot != NULL && epqslot != NULL)" condition.
3.
ExecBRDeleteTriggers(..)
{
- TupleTableSlot *newSlot;
int i;Assert(HeapTupleIsValid(fdw_trigtuple) ^ ItemPointerIsValid(tupleid));
if (fdw_trigtuple == NULL)
{
+ TupleTableSlot *newSlot;
+
..
}This change is okay on its own, but I think it has nothing to do with
this patch. If you don't mind, can we remove it?
Done.
4. +/* ---------- + * ExecBRDeleteTriggers() + * + * Called to execute BEFORE ROW DELETE triggers. + * + * Returns false in following scenarios : + * 1. Trigger function returned NULL. + * 2. The tuple was concurrently deleted OR it was concurrently updated and the + * new tuple didn't pass EvalPlanQual() test. + * 3. The tuple was concurrently updated and the new tuple passed the + * EvalPlanQual() test, BUT epqslot parameter is not NULL. In such case, this + * function skips the trigger function execution, because the caller has + * indicated that it wants to further process the updated tuple. The updated + * tuple slot is passed back through epqslot. + * + * In all other cases, returns true. + * ---------- + */ bool ExecBRDeleteTriggers(EState *estate, EPQState *epqstate,The above comments appear unrelated to this function, example, this
function is not at all aware of concurrent updates.
But with the patch, this function does become aware of concurrent
updates, because it makes use of the epqslot returned by
GetTupleForTrigger, This is similar to ExecBRUpdateTriggers() being
aware of concurrent updates.
The reason I made the return-value-related comment is because earlier
it was simple : If trigger function returned NULL, this function
returns false. But now that has changed, so its better if the change
is noted in the function header. And on HEAD, there is no function
header, so we need to explain when exactly this function returns true
or false.
I think if you want to add comments to this function, we can add them as a separate
patch or if you want to add them as part of this patch at least make
them succinct.
If it looks complicated, can you please suggest something to make it a
bit simpler.
5. + /* + * If this is part of an UPDATE of partition-key, the + * epq tuple will contain the changes from this + * transaction over and above the updates done by the + * other transaction. The caller should now use this + * tuple as its NEW tuple, rather than the earlier NEW + * tuple. + */I think we can simplify this comment as "If requested, pass back the
updated tuple if any.", something similar to what you have at some
other place in the patch.
Ok, Done. Also changed the subsequent comment to :
/* Normal DELETE: So delete the updated row. */
6. +# Session A is moving a row into another partition, but is waiting for +# another session B that is updating the original row. The row that ends up +# in the new partition should contain the changes made by session B.You have named sessions as s1 and s2, but description uses A and B, it
will be better to use s1 and s2 respectively.
Done.
Attached is v4 version of the patch. Thanks !
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company