a misbehavior of partition row movement (?)

Started by Amit Langoteover 5 years ago73 messageshackers
Jump to latest
#1Amit Langote
Langote_Amit_f8@lab.ntt.co.jp

Hi,

Robert forwarded me a pgsql-general thread [1]/messages/by-id/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com where a ON DELETE
CASCADE specified on a foreign key pointing to a partitioned table is
shown to cause a possibly surprising end result during an update of
the partitioned table. Example from that thread:

create table parent ( id serial, constraint parent_pkey primary key
(id)) partition by range (id);
create table parent_10 partition of parent for values from (0) to (10);
create table parent_20 partition of parent for values from (11) to (20);
create table child (id serial, parent_id int constraint parent_id_fk
references parent(id) on update cascade on delete cascade);
insert into parent values(0);
insert into child values(1,0);
update parent set id = 5; -- no row movement, so normal update
table parent;
id
----
5
(1 row)

table child;
id | parent_id
----+-----------
1 | 5
(1 row)

update parent set id = 15; -- row movement, so delete+insert
table parent;
id
----
15
(1 row)

table child; -- ON DELETE CASCADE having done its job
id | parent_id
----+-----------
(0 rows)

Reporter on that thread says that the last update should have failed
and I don't quite see a workable alternative to that. What we could
do is check before calling ExecDelete() that will perform the DELETE
part of the row movement if the foreign key action trigger that
implements the ON DELETE CASCADE action (an internal trigger) is among
the AR delete triggers that will run as part of that DELETE. If yes,
abort the operation. See attached a patch for that. I'm not terribly
happy with the error and details messages though:

update parent set id = 15;
ERROR: cannot move row being updated to another partition
DETAIL: Moving the row may cause a foreign key involving the source
partition to be violated.

Thoughts?

--
Amit Langote
EDB: http://www.enterprisedb.com

[1]: /messages/by-id/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com

Attachments:

prevent-row-movement-on-delete-cascade.patchapplication/octet-stream; name=prevent-row-movement-on-delete-cascade.patchDownload+66-0
#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Amit Langote (#1)

On Friday, October 2, 2020, Amit Langote <amitlangote09@gmail.com> wrote:

Reporter on that thread says that the last update should have failed
and I don't quite see a workable alternative to that.

To be clear the OP would rather have it just work, the same as the
non-row-movement version. Maybe insert the new row first, execute the on
update trigger chained from the old row, then delete the old row?

David J.

#3Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David G. Johnston (#2)
Re: a misbehavior of partition row movement (?)

On Fri, Oct 2, 2020 at 11:32 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Friday, October 2, 2020, Amit Langote <amitlangote09@gmail.com> wrote:

Reporter on that thread says that the last update should have failed
and I don't quite see a workable alternative to that.

To be clear the OP would rather have it just work, the same as the non-row-movement version. Maybe insert the new row first, execute the on update trigger chained from the old row, then delete the old row?

I was thinking yesterday about making it just work, but considering
the changes that would need to be made to how the underlying triggers
fire, it does not seem we would be able to back-port the solution.

--
Amit Langote
EDB: http://www.enterprisedb.com

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Amit Langote (#3)
Re: a misbehavior of partition row movement (?)

On Sat, Oct 03, 2020 at 11:42:21AM +0900, Amit Langote wrote:

On Fri, Oct 2, 2020 at 11:32 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Friday, October 2, 2020, Amit Langote <amitlangote09@gmail.com>
wrote:

Reporter on that thread says that the last update should have failed
and I don't quite see a workable alternative to that.

To be clear the OP would rather have it just work, the same as the
non-row-movement version. Maybe insert the new row first, execute
the on update trigger chained from the old row, then delete the old
row?

I was thinking yesterday about making it just work, but considering the
changes that would need to be made to how the underlying triggers fire,
it does not seem we would be able to back-port the solution.

I think we need to differentiate between master and backbranches. IMO we
should try to make it "just work" in master, and the amount of code
should not be an issue there I think (no opinion on whether insert and
update trigger is the way to go). For backbranches we may need to do
something less intrusive, of course.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tomas Vondra (#4)
Re: a misbehavior of partition row movement (?)

On Sat, Oct 3, 2020 at 8:15 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote

On Sat, Oct 03, 2020 at 11:42:21AM +0900, Amit Langote wrote:

On Fri, Oct 2, 2020 at 11:32 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Friday, October 2, 2020, Amit Langote <amitlangote09@gmail.com>
wrote:

Reporter on that thread says that the last update should have failed
and I don't quite see a workable alternative to that.

To be clear the OP would rather have it just work, the same as the
non-row-movement version. Maybe insert the new row first, execute
the on update trigger chained from the old row, then delete the old
row?

I was thinking yesterday about making it just work, but considering the
changes that would need to be made to how the underlying triggers fire,
it does not seem we would be able to back-port the solution.

I think we need to differentiate between master and backbranches. IMO we
should try to make it "just work" in master, and the amount of code
should not be an issue there I think (no opinion on whether insert and
update trigger is the way to go). For backbranches we may need to do
something less intrusive, of course.

Sure, that makes sense. I will try making a patch for HEAD to make it
just work unless someone beats me to it.

--
Amit Langote
EDB: http://www.enterprisedb.com

#6Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#5)
Re: a misbehavior of partition row movement (?)

On Sat, Oct 3, 2020 at 8:26 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sat, Oct 3, 2020 at 8:15 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote

On Sat, Oct 03, 2020 at 11:42:21AM +0900, Amit Langote wrote:

On Fri, Oct 2, 2020 at 11:32 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Friday, October 2, 2020, Amit Langote <amitlangote09@gmail.com>
wrote:

Reporter on that thread says that the last update should have failed
and I don't quite see a workable alternative to that.

To be clear the OP would rather have it just work, the same as the
non-row-movement version. Maybe insert the new row first, execute
the on update trigger chained from the old row, then delete the old
row?

I was thinking yesterday about making it just work, but considering the
changes that would need to be made to how the underlying triggers fire,
it does not seem we would be able to back-port the solution.

I think we need to differentiate between master and backbranches. IMO we
should try to make it "just work" in master, and the amount of code
should not be an issue there I think (no opinion on whether insert and
update trigger is the way to go). For backbranches we may need to do
something less intrusive, of course.

Sure, that makes sense. I will try making a patch for HEAD to make it
just work unless someone beats me to it.

After working on this for a while, here is my proposal for HEAD.

To reiterate, the problem occurs when an UPDATE of a partitioned PK
table is turned into a DELETE + INSERT. In that case, the UPDATE RI
triggers are not fired at all, but DELETE ones are, so the foreign key
may fail to get enforced correctly. For example, even though the new
row after the update is still logically present in the PK table, it
would wrongly get deleted because of the DELETE RI trigger firing if
there's a ON DELETE CASCADE clause on the foreign key.

To fix that, I propose that we teach trigger.c to skip queuing the
events that would be dangerous to fire, such as that for the DELETE on
the source leaf partition mentioned above. Instead, queue an UPDATE
event on the root target table, matching the actual operation being
performed. Note though that this new arrangement doesn't affect the
firing of any other triggers except those that are relevant to the
reported problem, viz. the PK-side RI triggers. All other triggers
fire exactly as they did before.

To make that happen, I had to:

1. Make RI triggers available on partitioned tables at all, which they
are not today. Also, mark the RI triggers in partitions correctly as
*clones* of the RI triggers in their respective parents.

2. Make it possible to allow AfterTriggerSaveEvent() to access the
query's actual target relation, that is, in addition to the target
relation on which an event was fired. Also, added a bunch of code to
AFTER TRIGGER infrastructure to handle events fired on partitioned
tables. Because those events cannot contain physical references to
affected tuples, I generalized the code currently used to handle after
triggers on foreign tables by storing the tuples in and retrieving
them from a tuple store. I read a bunch of caveats of that
implementation (such as its uselessness for deferred triggers), but
for the limited cases for which it will be used for partitioned
tables, it seems safe, because it won't be used for deferred triggers
on partitioned tables.

Attached patches 0001 and 0002 implement 1 and 2, respectively.
Later, I will post an updated version of the patch for the
back-branches, which, as mentioned upthread, is to prevent the
cross-partition updates on foreign key PK tables.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v1-0001-Create-foreign-key-triggers-in-partitioned-tables.patchapplication/octet-stream; name=v1-0001-Create-foreign-key-triggers-in-partitioned-tables.patchDownload+407-84
v1-0002-Enforce-foreign-key-correctly-during-cross-partit.patchapplication/octet-stream; name=v1-0002-Enforce-foreign-key-correctly-during-cross-partit.patchDownload+569-72
#7Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#6)
Re: a misbehavior of partition row movement (?)

On Fri, Nov 20, 2020 at 8:55 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sat, Oct 3, 2020 at 8:26 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Sat, Oct 3, 2020 at 8:15 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote

I think we need to differentiate between master and backbranches. IMO we
should try to make it "just work" in master, and the amount of code
should not be an issue there I think (no opinion on whether insert and
update trigger is the way to go). For backbranches we may need to do
something less intrusive, of course.

Sure, that makes sense. I will try making a patch for HEAD to make it
just work unless someone beats me to it.

After working on this for a while, here is my proposal for HEAD.

To reiterate, the problem occurs when an UPDATE of a partitioned PK
table is turned into a DELETE + INSERT. In that case, the UPDATE RI
triggers are not fired at all, but DELETE ones are, so the foreign key
may fail to get enforced correctly. For example, even though the new
row after the update is still logically present in the PK table, it
would wrongly get deleted because of the DELETE RI trigger firing if
there's a ON DELETE CASCADE clause on the foreign key.

To fix that, I propose that we teach trigger.c to skip queuing the
events that would be dangerous to fire, such as that for the DELETE on
the source leaf partition mentioned above. Instead, queue an UPDATE
event on the root target table, matching the actual operation being
performed. Note though that this new arrangement doesn't affect the
firing of any other triggers except those that are relevant to the
reported problem, viz. the PK-side RI triggers. All other triggers
fire exactly as they did before.

To make that happen, I had to:

1. Make RI triggers available on partitioned tables at all, which they
are not today. Also, mark the RI triggers in partitions correctly as
*clones* of the RI triggers in their respective parents.

2. Make it possible to allow AfterTriggerSaveEvent() to access the
query's actual target relation, that is, in addition to the target
relation on which an event was fired. Also, added a bunch of code to
AFTER TRIGGER infrastructure to handle events fired on partitioned
tables. Because those events cannot contain physical references to
affected tuples, I generalized the code currently used to handle after
triggers on foreign tables by storing the tuples in and retrieving
them from a tuple store. I read a bunch of caveats of that
implementation (such as its uselessness for deferred triggers), but
for the limited cases for which it will be used for partitioned
tables, it seems safe, because it won't be used for deferred triggers
on partitioned tables.

Attached patches 0001 and 0002 implement 1 and 2, respectively.
Later, I will post an updated version of the patch for the
back-branches, which, as mentioned upthread, is to prevent the
cross-partition updates on foreign key PK tables.

I have created a CF entry for this:

https://commitfest.postgresql.org/31/2877/

--
Amit Langote
EDB: http://www.enterprisedb.com

#8Arne Roland
A.Roland@index.de
In reply to: Amit Langote (#7)
Re: a misbehavior of partition row movement (?)

Hi,

thanks for looking into this. I haven't yet looked at your patch in detail, yet I think we should address the general issue here. To me this doesn't seem to be a RI-trigger issue, but a more general issue like I mentioned in the pg-bugs thread /messages/by-id/b1bfc99296e34725900bcd9689be8420@index.de

While I like the idea of making fks work, I'd prefer a solution that fires the appropriate row trigger for partitioned tables for non RI-cases as well.

Regards

Arne

#9Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Arne Roland (#8)
Re: a misbehavior of partition row movement (?)

Hi,

On Tue, Dec 15, 2020 at 12:01 AM Arne Roland <A.Roland@index.de> wrote:

thanks for looking into this. I haven't yet looked at your patch in detail, yet I think we should address the general issue here. To me this doesn't seem to be a RI-trigger issue, but a more general issue like I mentioned in the pg-bugs thread /messages/by-id/b1bfc99296e34725900bcd9689be8420@index.de

Hmm, maybe you're reading too much into the implementation details of
the fix, because the patch does fix the issue that you mention in the
linked report:

Quoting your original example:

drop table a, b;
create table a (id serial, primary key (id)) partition by range (id);
create table b (id serial, primary key (id)) partition by range (id);
alter table b add constraint a_fk foreign key (id) references a (id)
on delete cascade;
create table a1 partition of a for values from (1) to (2);
create table a2 partition of a for values from (2) to (3);
create table b1 partition of b for values from (1) to (2);
create table b2 partition of b for values from (2) to (3);
insert into a (id) values (1);
insert into b (id) values (1);

-- correctly errors out instead of silently performing the ON DELETE CASCADE
update a set id=2;
ERROR: update or delete on table "a" violates foreign key constraint
"a_fk" on table "b"
DETAIL: Key (id)=(1) is still referenced from table "b".

select * from b;
id
----
1
(1 row)

Changing the example to specify ON UPDATE CASCADE:

drop table a, b;
create table a (id serial, primary key (id)) partition by range (id);
create table b (id serial, primary key (id)) partition by range (id);
alter table b add constraint a_fk foreign key (id) references a (id)
on delete cascade;
create table a1 partition of a for values from (1) to (2);
create table a2 partition of a for values from (2) to (3);
create table b1 partition of b for values from (1) to (2);
create table b2 partition of b for values from (2) to (3);
insert into a (id) values (1);
insert into b (id) values (1);

-- correctly applies ON UPDATE CASCADE action
update a set id=2;
UPDATE 1

select * from b;
id
----
2
(1 row)

What am I missing about what you think is the more general problem to be solved?

While I like the idea of making fks work, I'd prefer a solution that fires the appropriate row trigger for partitioned tables for non RI-cases as well.

Maybe we could do that, but I don't know of a known issue where the
root cause is our firing of a row trigger on the leaf partition
instead of on the root partitioned table.

--
Amit Langote
EDB: http://www.enterprisedb.com

#10Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#9)
Re: a misbehavior of partition row movement (?)

On Tue, Dec 15, 2020 at 12:43 PM Amit Langote <amitlangote09@gmail.com> wrote:

Quoting your original example:

drop table a, b;
create table a (id serial, primary key (id)) partition by range (id);
create table b (id serial, primary key (id)) partition by range (id);
alter table b add constraint a_fk foreign key (id) references a (id)
on delete cascade;
create table a1 partition of a for values from (1) to (2);
create table a2 partition of a for values from (2) to (3);
create table b1 partition of b for values from (1) to (2);
create table b2 partition of b for values from (2) to (3);
insert into a (id) values (1);
insert into b (id) values (1);

-- correctly errors out instead of silently performing the ON DELETE CASCADE
update a set id=2;
ERROR: update or delete on table "a" violates foreign key constraint
"a_fk" on table "b"
DETAIL: Key (id)=(1) is still referenced from table "b".

select * from b;
id
----
1
(1 row)

Changing the example to specify ON UPDATE CASCADE:

drop table a, b;
create table a (id serial, primary key (id)) partition by range (id);
create table b (id serial, primary key (id)) partition by range (id);
alter table b add constraint a_fk foreign key (id) references a (id)
on delete cascade;

Oops, I copy-pasted the wrong block of text from my terminal. I meant:

alter table b add constraint a_fk foreign key (id) references a (id)
on delete cascade on update cascade;

create table a1 partition of a for values from (1) to (2);
create table a2 partition of a for values from (2) to (3);
create table b1 partition of b for values from (1) to (2);
create table b2 partition of b for values from (2) to (3);
insert into a (id) values (1);
insert into b (id) values (1);

-- correctly applies ON UPDATE CASCADE action
update a set id=2;
UPDATE 1

select * from b;
id
----
2
(1 row)

--
Amit Langote
EDB: http://www.enterprisedb.com

#11Arne Roland
A.Roland@index.de
In reply to: Amit Langote (#10)
Re: a misbehavior of partition row movement (?)

Hi Amit,

thanks for the quick reply! Sadly I have been busy and the second part of your patch does no longer apply in src/include/nodes/execnodes.h:497.

I'm sorry, I should have been more careful rereading my posts. The case I meant is the one below. I checked the thread again. I can barely believe, I didn't post such an example along back then. Sorry for the confusion!

create table a (id serial, primary key (id)) partition by range (id);
create table b (id serial, primary key (id)) partition by range (id);
create table a1 partition of a for values from (1) to (2);
create table a2 partition of a for values from (2) to (3);
create table b1 partition of b for values from (1) to (2);
create table b2 partition of b for values from (2) to (3);
insert into a (id) values (1);
insert into b (id) values (1);

create or replace function del_trig_fkt()
returns trigger
language plpgsql
as $$
begin
raise notice 'Deleted!';
return old;
end;
$$;
create trigger a_del_trig after delete on a for each row execute function del_trig_fkt();
create or replace function public.upd_trig_fkt()
returns trigger
language plpgsql
as $function$
begin
raise notice 'Updated!';
return new;
end;
$function$;
create trigger a_upd_trig after update on a for each row execute function upd_trig_fkt();

update a set id=2;

To me the issue seems to have litte to do with the fact that the trigger is executed on the leaf node in itself, but rather we lack the infrastructure to track whether the tuple is removed or only rerouted.

Regards
Arne

________________________________
From: Amit Langote <amitlangote09@gmail.com>
Sent: Tuesday, December 15, 2020 4:45:19 AM
To: Arne Roland
Cc: Tomas Vondra; David G. Johnston; PostgreSQL-development
Subject: Re: a misbehavior of partition row movement (?)

On Tue, Dec 15, 2020 at 12:43 PM Amit Langote <amitlangote09@gmail.com> wrote:

Quoting your original example:

drop table a, b;
create table a (id serial, primary key (id)) partition by range (id);
create table b (id serial, primary key (id)) partition by range (id);
alter table b add constraint a_fk foreign key (id) references a (id)
on delete cascade;
create table a1 partition of a for values from (1) to (2);
create table a2 partition of a for values from (2) to (3);
create table b1 partition of b for values from (1) to (2);
create table b2 partition of b for values from (2) to (3);
insert into a (id) values (1);
insert into b (id) values (1);

-- correctly errors out instead of silently performing the ON DELETE CASCADE
update a set id=2;
ERROR: update or delete on table "a" violates foreign key constraint
"a_fk" on table "b"
DETAIL: Key (id)=(1) is still referenced from table "b".

select * from b;
id
----
1
(1 row)

Changing the example to specify ON UPDATE CASCADE:

drop table a, b;
create table a (id serial, primary key (id)) partition by range (id);
create table b (id serial, primary key (id)) partition by range (id);
alter table b add constraint a_fk foreign key (id) references a (id)
on delete cascade;

Oops, I copy-pasted the wrong block of text from my terminal. I meant:

alter table b add constraint a_fk foreign key (id) references a (id)
on delete cascade on update cascade;

create table a1 partition of a for values from (1) to (2);
create table a2 partition of a for values from (2) to (3);
create table b1 partition of b for values from (1) to (2);
create table b2 partition of b for values from (2) to (3);
insert into a (id) values (1);
insert into b (id) values (1);

-- correctly applies ON UPDATE CASCADE action
update a set id=2;
UPDATE 1

select * from b;
id
----
2
(1 row)

--
Amit Langote
EDB: http://www.enterprisedb.com

#12Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Arne Roland (#11)
Re: a misbehavior of partition row movement (?)

Hi,

On Mon, Dec 21, 2020 at 11:30 PM Arne Roland <A.Roland@index.de> wrote:

thanks for the quick reply! Sadly I have been busy and the second part of your patch does no longer apply in src/include/nodes/execnodes.h:497.

I don't see any problem with applying the patch. Are you sure you're
applying the patch to the correct git branch? The patch is meant to
be applied to the development (master) branch.

I'm sorry, I should have been more careful rereading my posts. The case I meant is the one below. I checked the thread again. I can barely believe, I didn't post such an example along back then. Sorry for the confusion!

No worries, thanks for the follow up.

create table a (id serial, primary key (id)) partition by range (id);
create table b (id serial, primary key (id)) partition by range (id);
create table a1 partition of a for values from (1) to (2);
create table a2 partition of a for values from (2) to (3);
create table b1 partition of b for values from (1) to (2);
create table b2 partition of b for values from (2) to (3);
insert into a (id) values (1);
insert into b (id) values (1);

create or replace function del_trig_fkt()
returns trigger
language plpgsql
as $$
begin
raise notice 'Deleted!';
return old;
end;
$$;
create trigger a_del_trig after delete on a for each row execute function del_trig_fkt();
create or replace function public.upd_trig_fkt()
returns trigger
language plpgsql
as $function$
begin
raise notice 'Updated!';
return new;
end;
$function$;
create trigger a_upd_trig after update on a for each row execute function upd_trig_fkt();

update a set id=2;

The output for this I get with (or without) the patch is:

NOTICE: Deleted!
UPDATE 1

To me the issue seems to have litte to do with the fact that the trigger is executed on the leaf node in itself, but rather we lack the infrastructure to track whether the tuple is removed or only rerouted.

This behavior of partition key updates with regard to *user-defined*
AFTER triggers is documented:

https://www.postgresql.org/docs/current/trigger-definition.html

"As far as AFTER ROW triggers are concerned, AFTER DELETE and AFTER
INSERT triggers are applied; but AFTER UPDATE triggers are not applied
because the UPDATE has been converted to a DELETE and an INSERT."

I don't quite recall if the decision to implement it like this was
based on assuming that this is what users would like to see happen in
this case or the perceived difficulty of implementing it the other way
around, that is, of firing AFTER UPDATE triggers in this case.

As for the original issue of this thread, it happens to be fixed by
firing the *internal* AFTER UPDATE triggers that are involved in
enforcing the foreign key even if the UPDATE has been turned into
DELETE+INSERT, which it makes sense to do, because what can happen
today with CASCADE triggers does not seem like a useful behavior by
any measure.

--
Amit Langote
EDB: http://www.enterprisedb.com

#13Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#12)
Re: a misbehavior of partition row movement (?)

On Tue, Dec 22, 2020 at 4:16 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Mon, Dec 21, 2020 at 11:30 PM Arne Roland <A.Roland@index.de> wrote:

thanks for the quick reply! Sadly I have been busy and the second part of your patch does no longer apply in src/include/nodes/execnodes.h:497.

I don't see any problem with applying the patch. Are you sure you're
applying the patch to the correct git branch? The patch is meant to
be applied to the development (master) branch.

Sorry, it seems you are right and the 2nd patch indeed fails to apply to master.

Attached updated version.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v2-0001-Create-foreign-key-triggers-in-partitioned-tables.patchapplication/octet-stream; name=v2-0001-Create-foreign-key-triggers-in-partitioned-tables.patchDownload+407-84
v2-0002-Enforce-foreign-key-correctly-during-cross-partit.patchapplication/octet-stream; name=v2-0002-Enforce-foreign-key-correctly-during-cross-partit.patchDownload+569-72
#14Arne Roland
A.Roland@index.de
In reply to: Amit Langote (#13)
Re: a misbehavior of partition row movement (?)

While I'd agree that simply deleting with "on delete cascade" on tuple rerouting is a strong enough violation of the spirit of partitioning changing that behavior, I am surprised by the idea to make a differentiation between fks and other triggers. The way user defined triggers work, is a violation to the same degree I get complaints about that on a monthly (if not weekly) basis. It's easy to point towards the docs, but the docs offer no solution to archive the behavior, that makes partitioning somewhat transparent. Most people I know don't partition because they like to complicate things, but just because they have to much data.

It isn't just a thing with after triggers. Imo before triggers are even worse: If we move a row between partitions we fire all three triggers at once (at different leaf pages obviously). It's easy to point towards the docs. Heart bleed was well documented, but I am happy that it was fixed. I don't want to imply this totally unrelated security issue has anything to do with our weird behavior. I just want to raise the question whether that's a good thing, because frankly I haven't met anyone thus far, who thought it is.

To me it seems the RI triggers are just a specific victim of the way we made triggers work in general.

What I tried to express, albeit I apparently failed: I care about having triggers, which make partitioning transparent, on the long run.

because what can happen
today with CASCADE triggers does not seem like a useful behavior by
any measure.

I wholeheartedly agree. I just want to point out, that you could state the same about triggers in general.

Backwards compatibility is usually a pretty compelling argument. I would still want to raise the question, whether this should change, because I frankly think this is a bad idea.

You might ask: Why am I raising this question in this thread?

In case we can not (instantly) agree on the fact that this behavior should change, I'd still prefer to think about making that behavior possible for other triggers (possibly later) as well. One possibility would be having an entry in the catalog to mark when the trigger should fire.

I don't want to stall your definitely useful RI-Trigger fix, but I sincerely believe, that we have to do better with triggers in general.

If we would make the condition when to fire or not dependent something besides that fact whether the trigger is internal, we could at a later date choose to make both ways available, if anyone makes a good case for this. Even though I still think it's not worth it.

I don't quite recall if the decision to implement it like this was
based on assuming that this is what users would like to see happen in
this case or the perceived difficulty of implementing it the other way
around, that is, of firing AFTER UPDATE triggers in this case.

I tried to look that up, but I couldn't find any discussion about this. Do you have any ideas in which thread that was handled?

Sorry, it seems you are right and the 2nd patch indeed fails to apply to master.

Thank you! I hope to have a more in depth look later this week.

Regards
Arne

#15Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Arne Roland (#14)
Re: a misbehavior of partition row movement (?)

Hi,

On Mon, Dec 28, 2020 at 10:01 PM Arne Roland <A.Roland@index.de> wrote:

While I'd agree that simply deleting with "on delete cascade" on tuple rerouting is a strong enough violation of the spirit of partitioning changing that behavior, I am surprised by the idea to make a differentiation between fks and other triggers. The way user defined triggers work, is a violation to the same degree I get complaints about that on a monthly (if not weekly) basis. It's easy to point towards the docs, but the docs offer no solution to archive the behavior, that makes partitioning somewhat transparent. Most people I know don't partition because they like to complicate things, but just because they have to much data.

It isn't just a thing with after triggers. Imo before triggers are even worse: If we move a row between partitions we fire all three triggers at once (at different leaf pages obviously). It's easy to point towards the docs. Heart bleed was well documented, but I am happy that it was fixed. I don't want to imply this totally unrelated security issue has anything to do with our weird behavior. I just want to raise the question whether that's a good thing, because frankly I haven't met anyone thus far, who thought it is.

Just to be clear, are you only dissatisfied with the firing of
triggers during a row-moving UPDATEs on partitioned tables or all
firing behaviors of triggers defined on partitioned tables? Can you
give a specific example of the odd behavior in that case?

To me it seems the RI triggers are just a specific victim of the way we made triggers work in general.

That's true.

What I tried to express, albeit I apparently failed: I care about having triggers, which make partitioning transparent, on the long run.

because what can happen
today with CASCADE triggers does not seem like a useful behavior by
any measure.

I wholeheartedly agree. I just want to point out, that you could state the same about triggers in general.

Backwards compatibility is usually a pretty compelling argument. I would still want to raise the question, whether this should change, because I frankly think this is a bad idea.

You might ask: Why am I raising this question in this thread?

In case we can not (instantly) agree on the fact that this behavior should change, I'd still prefer to think about making that behavior possible for other triggers (possibly later) as well. One possibility would be having an entry in the catalog to mark when the trigger should fire.

Sorry, I don't understand what new setting for triggers you are
thinking of here.

I don't want to stall your definitely useful RI-Trigger fix, but I sincerely believe, that we have to do better with triggers in general.

If we would make the condition when to fire or not dependent something besides that fact whether the trigger is internal, we could at a later date choose to make both ways available, if anyone makes a good case for this. Even though I still think it's not worth it.

I don't quite recall if the decision to implement it like this was
based on assuming that this is what users would like to see happen in
this case or the perceived difficulty of implementing it the other way
around, that is, of firing AFTER UPDATE triggers in this case.

I tried to look that up, but I couldn't find any discussion about this. Do you have any ideas in which thread that was handled?

It was discussed here:

/messages/by-id/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com

It's a huge discussion, so you'll have to ctrl+f "trigger" to spot
relevant emails. You might notice that the developers who
participated in that discussion gave various opinions and what we have
today got there as a result of a majority of them voting for the
current approach. Someone also said this during the discussion:
"Regarding the trigger issue, I can't claim to have a terribly strong
opinion on this. I think that practically anything we do here might
upset somebody, but probably any halfway-reasonable thing we choose to
do will be OK for most people." So what we've got is that
"halfway-reasonable" thing, YMMV. :)

--
Amit Langote
EDB: http://www.enterprisedb.com

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Amit Langote (#15)
Re: a misbehavior of partition row movement (?)

On 2021-01-08 09:54, Amit Langote wrote:

I don't quite recall if the decision to implement it like this was
based on assuming that this is what users would like to see happen in
this case or the perceived difficulty of implementing it the other way
around, that is, of firing AFTER UPDATE triggers in this case.

I tried to look that up, but I couldn't find any discussion about this. Do you have any ideas in which thread that was handled?

It was discussed here:

/messages/by-id/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com

It's a huge discussion, so you'll have to ctrl+f "trigger" to spot
relevant emails. You might notice that the developers who
participated in that discussion gave various opinions and what we have
today got there as a result of a majority of them voting for the
current approach. Someone also said this during the discussion:
"Regarding the trigger issue, I can't claim to have a terribly strong
opinion on this. I think that practically anything we do here might
upset somebody, but probably any halfway-reasonable thing we choose to
do will be OK for most people." So what we've got is that
"halfway-reasonable" thing, YMMV. :)

Could you summarize here what you are trying to do with respect to what
was decided before? I'm a bit confused, looking through the patches you
have posted. The first patch you posted hard-coded FK trigger OIDs
specifically, other patches talk about foreign key triggers in general
or special case internal triggers or talk about all triggers.

#17Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Peter Eisentraut (#16)
Re: a misbehavior of partition row movement (?)

On Wed, Jan 20, 2021 at 4:13 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 2021-01-08 09:54, Amit Langote wrote:

I don't quite recall if the decision to implement it like this was
based on assuming that this is what users would like to see happen in
this case or the perceived difficulty of implementing it the other way
around, that is, of firing AFTER UPDATE triggers in this case.

I tried to look that up, but I couldn't find any discussion about this. Do you have any ideas in which thread that was handled?

It was discussed here:

/messages/by-id/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com

It's a huge discussion, so you'll have to ctrl+f "trigger" to spot
relevant emails. You might notice that the developers who
participated in that discussion gave various opinions and what we have
today got there as a result of a majority of them voting for the
current approach. Someone also said this during the discussion:
"Regarding the trigger issue, I can't claim to have a terribly strong
opinion on this. I think that practically anything we do here might
upset somebody, but probably any halfway-reasonable thing we choose to
do will be OK for most people." So what we've got is that
"halfway-reasonable" thing, YMMV. :)

Could you summarize here what you are trying to do with respect to what
was decided before? I'm a bit confused, looking through the patches you
have posted. The first patch you posted hard-coded FK trigger OIDs
specifically, other patches talk about foreign key triggers in general
or special case internal triggers or talk about all triggers.

The original problem statement is this: the way we generally fire
row-level triggers of a partitioned table can lead to some unexpected
behaviors of the foreign keys pointing to that partitioned table
during its cross-partition updates.

Let's start with an example that shows how triggers are fired during a
cross-partition update:

create table p (a numeric primary key) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig1 before update on p for each row execute function
report_trig_details();
create trigger trig2 after update on p for each row execute function
report_trig_details();
create trigger trig3 before delete on p for each row execute function
report_trig_details();
create trigger trig4 after delete on p for each row execute function
report_trig_details();
create trigger trig5 before insert on p for each row execute function
report_trig_details();
create trigger trig6 after insert on p for each row execute function
report_trig_details();

insert into p values (1);
update p set a = 2;
NOTICE: BEFORE UPDATE on p1
NOTICE: BEFORE DELETE on p1
NOTICE: BEFORE INSERT on p2
NOTICE: AFTER DELETE on p1
NOTICE: AFTER INSERT on p2
UPDATE 1

(AR update triggers are not fired.)

For contrast, here is an intra-partition update:

update p set a = a;
NOTICE: BEFORE UPDATE on p2
NOTICE: AFTER UPDATE on p2
UPDATE 1

Now, the trigger machinery makes no distinction between user-defined
and internal triggers, which has implications for the foreign key
enforcing triggers on partitions. Consider the following example:

create table q (a bigint references p);
insert into q values (2);
update p set a = 1;
NOTICE: BEFORE UPDATE on p2
NOTICE: BEFORE DELETE on p2
NOTICE: BEFORE INSERT on p1
ERROR: update or delete on table "p2" violates foreign key constraint
"q_a_fkey2" on table "q"
DETAIL: Key (a)=(2) is still referenced from table "q".

So the RI delete trigger (NOT update) on p2 prevents the DELETE that
occurs as part of the row movement. One might make the updates
cascade and expect that to prevent the error:

drop table q;
create table q (a bigint references p on update cascade);
insert into q values (2);
update p set a = 1;
NOTICE: BEFORE UPDATE on p2
NOTICE: BEFORE DELETE on p2
NOTICE: BEFORE INSERT on p1
ERROR: update or delete on table "p2" violates foreign key constraint
"q_a_fkey2" on table "q"
DETAIL: Key (a)=(2) is still referenced from table "q".

No luck, because again it's the RI delete trigger on p2 that gets
fired. If you make deletes cascade too, an even worse thing happens:

drop table q;
create table q (a bigint references p on update cascade on delete cascade);
insert into q values (2);
update p set a = 1;
NOTICE: BEFORE UPDATE on p2
NOTICE: BEFORE DELETE on p2
NOTICE: BEFORE INSERT on p1
NOTICE: AFTER DELETE on p2
NOTICE: AFTER INSERT on p1
UPDATE 1
select * from q;
a
---
(0 rows)

The RI delete trigger deleted 2 from q, whereas the expected result
would've been for q to be updated to change 2 to 1.

This shows that the way we've made these triggers behave in general
can cause some unintended behaviors for foreign keys during
cross-partition updates. I started this thread to do something about
that and sent a patch to prevent cross-partition updates at all when
there are foreign keys pointing to it. As others pointed out, that's
not a great long-term solution to the problem, but that's what we may
have to do in the back-branches if anything at all.

So I wrote another patch targeting the dev branch to make the
cross-partition updates work while producing a sane foreign key
behavior. The idea of the patch is to tweak the firing of AFTER
triggers such that unintended RI triggers don't get fired, that is,
those corresponding to DELETE and INSERT occurring internally as part
of a cross-partition update. Instead we now fire the AFTER UPDATE
triggers, passing the root table as the target result relation (not
the source partition because the new row doesn't belong to it). This
results in the same foreign key behavior as when no partitioning is
involved at all.

Then, Arne came along and suggested that we do this kind of firing for
*all* triggers, not just the internal RI triggers, or at least that's
what I understood Arne as saying. That however would be changing the
original design of cross-partition updates and will change the
documented user-visible trigger behavior. Changing this for internal
triggers like the patch does changes no user-visible behavior, AFAIK,
other than fixing the foreign key annoyance. So I said if we do want
to go the way that Arne is suggesting, it should be its own discussion
and that's that.

Sorry for a long "summary", but I hope it helps clarify things somewhat.

--
Amit Langote
EDB: http://www.enterprisedb.com

#18Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#17)
Re: a misbehavior of partition row movement (?)

On Wed, Jan 20, 2021 at 7:03 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Jan 20, 2021 at 4:13 PM Peter Eisentraut

Could you summarize here what you are trying to do with respect to what
was decided before? I'm a bit confused, looking through the patches you
have posted. The first patch you posted hard-coded FK trigger OIDs
specifically, other patches talk about foreign key triggers in general
or special case internal triggers or talk about all triggers.

The original problem statement is this: the way we generally fire
row-level triggers of a partitioned table can lead to some unexpected
behaviors of the foreign keys pointing to that partitioned table
during its cross-partition updates.

Let's start with an example that shows how triggers are fired during a
cross-partition update:

create table p (a numeric primary key) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig1 before update on p for each row execute function
report_trig_details();
create trigger trig2 after update on p for each row execute function
report_trig_details();
create trigger trig3 before delete on p for each row execute function
report_trig_details();
create trigger trig4 after delete on p for each row execute function
report_trig_details();
create trigger trig5 before insert on p for each row execute function
report_trig_details();
create trigger trig6 after insert on p for each row execute function
report_trig_details();

insert into p values (1);
update p set a = 2;
NOTICE: BEFORE UPDATE on p1
NOTICE: BEFORE DELETE on p1
NOTICE: BEFORE INSERT on p2
NOTICE: AFTER DELETE on p1
NOTICE: AFTER INSERT on p2
UPDATE 1

(AR update triggers are not fired.)

For contrast, here is an intra-partition update:

update p set a = a;
NOTICE: BEFORE UPDATE on p2
NOTICE: AFTER UPDATE on p2
UPDATE 1

Now, the trigger machinery makes no distinction between user-defined
and internal triggers, which has implications for the foreign key
enforcing triggers on partitions. Consider the following example:

create table q (a bigint references p);
insert into q values (2);
update p set a = 1;
NOTICE: BEFORE UPDATE on p2
NOTICE: BEFORE DELETE on p2
NOTICE: BEFORE INSERT on p1
ERROR: update or delete on table "p2" violates foreign key constraint
"q_a_fkey2" on table "q"
DETAIL: Key (a)=(2) is still referenced from table "q".

So the RI delete trigger (NOT update) on p2 prevents the DELETE that
occurs as part of the row movement. One might make the updates
cascade and expect that to prevent the error:

drop table q;
create table q (a bigint references p on update cascade);
insert into q values (2);
update p set a = 1;
NOTICE: BEFORE UPDATE on p2
NOTICE: BEFORE DELETE on p2
NOTICE: BEFORE INSERT on p1
ERROR: update or delete on table "p2" violates foreign key constraint
"q_a_fkey2" on table "q"
DETAIL: Key (a)=(2) is still referenced from table "q".

No luck, because again it's the RI delete trigger on p2 that gets
fired. If you make deletes cascade too, an even worse thing happens:

drop table q;
create table q (a bigint references p on update cascade on delete cascade);
insert into q values (2);
update p set a = 1;
NOTICE: BEFORE UPDATE on p2
NOTICE: BEFORE DELETE on p2
NOTICE: BEFORE INSERT on p1
NOTICE: AFTER DELETE on p2
NOTICE: AFTER INSERT on p1
UPDATE 1
select * from q;
a
---
(0 rows)

The RI delete trigger deleted 2 from q, whereas the expected result
would've been for q to be updated to change 2 to 1.

This shows that the way we've made these triggers behave in general
can cause some unintended behaviors for foreign keys during
cross-partition updates. I started this thread to do something about
that and sent a patch to prevent cross-partition updates at all when
there are foreign keys pointing to it. As others pointed out, that's
not a great long-term solution to the problem, but that's what we may
have to do in the back-branches if anything at all.

So I wrote another patch targeting the dev branch to make the
cross-partition updates work while producing a sane foreign key
behavior. The idea of the patch is to tweak the firing of AFTER
triggers such that unintended RI triggers don't get fired, that is,
those corresponding to DELETE and INSERT occurring internally as part
of a cross-partition update. Instead we now fire the AFTER UPDATE
triggers, passing the root table as the target result relation (not
the source partition because the new row doesn't belong to it). This
results in the same foreign key behavior as when no partitioning is
involved at all.

Then, Arne came along and suggested that we do this kind of firing for
*all* triggers, not just the internal RI triggers, or at least that's
what I understood Arne as saying. That however would be changing the
original design of cross-partition updates and will change the
documented user-visible trigger behavior. Changing this for internal
triggers like the patch does changes no user-visible behavior, AFAIK,
other than fixing the foreign key annoyance. So I said if we do want
to go the way that Arne is suggesting, it should be its own discussion
and that's that.

Sorry for a long "summary", but I hope it helps clarify things somewhat.

Here is an updated version of the patch with some cosmetic changes
from the previous version. I moved the code being added to
AfterTriggerSaveEvent() and ExecUpdate() into separate subroutines to
improve readability, hopefully.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v3-0001-Create-foreign-key-triggers-in-partitioned-tables.patchapplication/octet-stream; name=v3-0001-Create-foreign-key-triggers-in-partitioned-tables.patchDownload+407-84
v3-0002-Enforce-foreign-key-correctly-during-cross-partit.patchapplication/octet-stream; name=v3-0002-Enforce-foreign-key-correctly-during-cross-partit.patchDownload+609-72
#19Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Langote (#17)
Re: a misbehavior of partition row movement (?)

On Wed, Jan 20, 2021 at 7:04 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Wed, Jan 20, 2021 at 4:13 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 2021-01-08 09:54, Amit Langote wrote:

I don't quite recall if the decision to implement it like this was
based on assuming that this is what users would like to see happen in
this case or the perceived difficulty of implementing it the other way
around, that is, of firing AFTER UPDATE triggers in this case.

I tried to look that up, but I couldn't find any discussion about this. Do you have any ideas in which thread that was handled?

It was discussed here:

/messages/by-id/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com

It's a huge discussion, so you'll have to ctrl+f "trigger" to spot
relevant emails. You might notice that the developers who
participated in that discussion gave various opinions and what we have
today got there as a result of a majority of them voting for the
current approach. Someone also said this during the discussion:
"Regarding the trigger issue, I can't claim to have a terribly strong
opinion on this. I think that practically anything we do here might
upset somebody, but probably any halfway-reasonable thing we choose to
do will be OK for most people." So what we've got is that
"halfway-reasonable" thing, YMMV. :)

Could you summarize here what you are trying to do with respect to what
was decided before? I'm a bit confused, looking through the patches you
have posted. The first patch you posted hard-coded FK trigger OIDs
specifically, other patches talk about foreign key triggers in general
or special case internal triggers or talk about all triggers.

The original problem statement is this: the way we generally fire
row-level triggers of a partitioned table can lead to some unexpected
behaviors of the foreign keys pointing to that partitioned table
during its cross-partition updates.

Let's start with an example that shows how triggers are fired during a
cross-partition update:

create table p (a numeric primary key) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig1 before update on p for each row execute function
report_trig_details();
create trigger trig2 after update on p for each row execute function
report_trig_details();
create trigger trig3 before delete on p for each row execute function
report_trig_details();
create trigger trig4 after delete on p for each row execute function
report_trig_details();
create trigger trig5 before insert on p for each row execute function
report_trig_details();
create trigger trig6 after insert on p for each row execute function
report_trig_details();

insert into p values (1);
update p set a = 2;
NOTICE: BEFORE UPDATE on p1
NOTICE: BEFORE DELETE on p1
NOTICE: BEFORE INSERT on p2
NOTICE: AFTER DELETE on p1
NOTICE: AFTER INSERT on p2
UPDATE 1

(AR update triggers are not fired.)

For contrast, here is an intra-partition update:

update p set a = a;
NOTICE: BEFORE UPDATE on p2
NOTICE: AFTER UPDATE on p2
UPDATE 1

Now, the trigger machinery makes no distinction between user-defined
and internal triggers, which has implications for the foreign key
enforcing triggers on partitions. Consider the following example:

create table q (a bigint references p);
insert into q values (2);
update p set a = 1;
NOTICE: BEFORE UPDATE on p2
NOTICE: BEFORE DELETE on p2
NOTICE: BEFORE INSERT on p1
ERROR: update or delete on table "p2" violates foreign key constraint
"q_a_fkey2" on table "q"
DETAIL: Key (a)=(2) is still referenced from table "q".

So the RI delete trigger (NOT update) on p2 prevents the DELETE that
occurs as part of the row movement. One might make the updates
cascade and expect that to prevent the error:

drop table q;
create table q (a bigint references p on update cascade);
insert into q values (2);
update p set a = 1;
NOTICE: BEFORE UPDATE on p2
NOTICE: BEFORE DELETE on p2
NOTICE: BEFORE INSERT on p1
ERROR: update or delete on table "p2" violates foreign key constraint
"q_a_fkey2" on table "q"
DETAIL: Key (a)=(2) is still referenced from table "q".

No luck, because again it's the RI delete trigger on p2 that gets
fired. If you make deletes cascade too, an even worse thing happens:

drop table q;
create table q (a bigint references p on update cascade on delete cascade);
insert into q values (2);
update p set a = 1;
NOTICE: BEFORE UPDATE on p2
NOTICE: BEFORE DELETE on p2
NOTICE: BEFORE INSERT on p1
NOTICE: AFTER DELETE on p2
NOTICE: AFTER INSERT on p1
UPDATE 1
select * from q;
a
---
(0 rows)

The RI delete trigger deleted 2 from q, whereas the expected result
would've been for q to be updated to change 2 to 1.

Thank you for a good summary. That's helpful to catch up on this thread.

This shows that the way we've made these triggers behave in general
can cause some unintended behaviors for foreign keys during
cross-partition updates. I started this thread to do something about
that and sent a patch to prevent cross-partition updates at all when
there are foreign keys pointing to it. As others pointed out, that's
not a great long-term solution to the problem, but that's what we may
have to do in the back-branches if anything at all.

I've started by reviewing the patch for back-patching that the first
patch you posted[1]/messages/by-id/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com.

+       for (i = 0; i < trigdesc->numtriggers; i++)
+       {
+           Trigger *trigger = &trigdesc->triggers[i];
+
+           if (trigger->tgisinternal &&
+               OidIsValid(trigger->tgconstrrelid) &&
+               trigger->tgfoid == F_RI_FKEY_CASCADE_DEL)
+           {
+               found = true;
+               break;
+           }
+       }

IIUC the above checks if the partition table is referenced by a
foreign key constraint on another table with ON DELETE CASCADE option.
I think we should prevent cross-partition update also when ON DELETE
SET NULL and ON DELETE SET DEFAULT. For example, with the patch, a
tuple in a partition table is still updated to NULL when
cross-partition update as follows:

postgres=# create table p (a numeric primary key) partition by list (a);
CREATE TABLE
postgres=# create table p1 partition of p for values in (1);
CREATE TABLE
postgres=# create table p2 partition of p for values in (2);
CREATE TABLE
postgres=# insert into p values (1);
INSERT 0 1
postgres=# create table q (a int references p(a) on delete set null);
CREATE TABLE
postgres=# insert into q values (1);
INSERT 0 1
postgres=# update p set a = 2;
UPDATE 1
postgres=# table p;
a
---
2
(1 row)

postgres=# table q;
a
---

(1 row)

Regards,

[1]: /messages/by-id/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#20Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Masahiko Sawada (#19)
Re: a misbehavior of partition row movement (?)

Thank you for looking at this.

On Mon, Feb 15, 2021 at 4:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jan 20, 2021 at 7:04 PM Amit Langote <amitlangote09@gmail.com> wrote:

This shows that the way we've made these triggers behave in general
can cause some unintended behaviors for foreign keys during
cross-partition updates. I started this thread to do something about
that and sent a patch to prevent cross-partition updates at all when
there are foreign keys pointing to it. As others pointed out, that's
not a great long-term solution to the problem, but that's what we may
have to do in the back-branches if anything at all.

I've started by reviewing the patch for back-patching that the first
patch you posted[1].

+       for (i = 0; i < trigdesc->numtriggers; i++)
+       {
+           Trigger *trigger = &trigdesc->triggers[i];
+
+           if (trigger->tgisinternal &&
+               OidIsValid(trigger->tgconstrrelid) &&
+               trigger->tgfoid == F_RI_FKEY_CASCADE_DEL)
+           {
+               found = true;
+               break;
+           }
+       }

IIUC the above checks if the partition table is referenced by a
foreign key constraint on another table with ON DELETE CASCADE option.
I think we should prevent cross-partition update also when ON DELETE
SET NULL and ON DELETE SET DEFAULT. For example, with the patch, a
tuple in a partition table is still updated to NULL when
cross-partition update as follows:

postgres=# create table p (a numeric primary key) partition by list (a);
CREATE TABLE
postgres=# create table p1 partition of p for values in (1);
CREATE TABLE
postgres=# create table p2 partition of p for values in (2);
CREATE TABLE
postgres=# insert into p values (1);
INSERT 0 1
postgres=# create table q (a int references p(a) on delete set null);
CREATE TABLE
postgres=# insert into q values (1);
INSERT 0 1
postgres=# update p set a = 2;
UPDATE 1
postgres=# table p;
a
---
2
(1 row)

postgres=# table q;
a
---

(1 row)

Yeah, I agree that's not good.

Actually, I had meant to send an updated version of the patch to apply
in back-branches that would prevent such a cross-partition update, but
never did since starting to work on the patch for HEAD. I have
attached it here.

Regarding the patch, I would have liked if it only prevented the
update when the foreign key that would be violated by the component
delete references the update query's main target relation. If the
foreign key references the source partition directly, then the delete
would be harmless. However there's not enough infrastructure in v12,
v13 branches to determine that, which makes back-patching this a bit
hard. For example, there's no way to tell in the back-branches if the
foreign-key-enforcing triggers of a leaf partition have descended from
the parent table. IOW, I am not so sure anymore if we should try to
do anything in the back-branches.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

prevent-row-movement-on-pk-table.patchapplication/octet-stream; name=prevent-row-movement-on-pk-table.patchDownload+46-5
#21Rahila Syed
rahilasyed90@gmail.com
In reply to: Amit Langote (#18)
#22Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Rahila Syed (#21)
#23Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Langote (#20)
#24Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Masahiko Sawada (#23)
#25Rahila Syed
rahilasyed90@gmail.com
In reply to: Amit Langote (#22)
#26Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Langote (#24)
#27Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Rahila Syed (#25)
#28Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Langote (#27)
#29Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Masahiko Sawada (#28)
#30Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Langote (#29)
#31Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Masahiko Sawada (#30)
#32Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Amit Langote (#31)
#33Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Ibrar Ahmed (#32)
#34Andrew Dunstan
andrew@dunslane.net
In reply to: Amit Langote (#33)
#35Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andrew Dunstan (#34)
#36Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#35)
#37Zhihong Yu
zyu@yugabyte.com
In reply to: Amit Langote (#36)
#38Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#36)
#39Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#38)
#40Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#39)
#41Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#40)
#42Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#41)
#43Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#42)
#44Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#43)
#45Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#44)
#46Julien Rouhaud
rjuju123@gmail.com
In reply to: Amit Langote (#44)
#47Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Julien Rouhaud (#46)
#48Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#45)
#49Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#48)
#50Zhihong Yu
zyu@yugabyte.com
In reply to: Alvaro Herrera (#49)
#51Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Zhihong Yu (#50)
#52Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#48)
#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#52)
#54Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#53)
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#54)
#56Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#54)
#57Julien Rouhaud
rjuju123@gmail.com
In reply to: Amit Langote (#48)
#58Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#57)
#59Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#58)
#60Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Julien Rouhaud (#59)
#61Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#56)
#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Julien Rouhaud (#57)
#63Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#61)
#64Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#63)
#65Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#64)
#66Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#64)
#67Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#66)
#68Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#65)
#69Zhihong Yu
zyu@yugabyte.com
In reply to: Alvaro Herrera (#68)
#70Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Zhihong Yu (#69)
#71Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#70)
#72Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Amit Langote (#71)
#73Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Alvaro Herrera (#72)