Fix for a crash caused by triggers in cross-partition updates
Hello.
We received a crash report related to cross-partition updates
involving multiple triggers.
After investigating the situation, we found that the crash occurs by
the following steps in PostgreSQL versions 11 through 14 (We have not
checked versions 10 and earlier.):
create table tp(a int) partition by range (a);
create table tc1 partition of tp for values from (0) to (10);
create table tc2 partition of tp for values from (10) to (20);
insert into tp values (1);
create or replace function trg() returns trigger as $$BEGIN RETURN NULL; END; $$ language plpgsql;
create trigger td after delete on tp referencing old table old for each statement execute function trg();
create trigger tu after update on tp referencing new table new for each statement execute function trg();
update tp set a = 11 where a = 1;
<crash>
In this scenario, AfterTriggerSaveEvent is passed an inconsistent
TransitionCaptureState. Specifically, it indicates that
delete_old_table is required, but the corresponding tuplestore
(old_tuplestore) is not actually provided, leading to a crash. This
issue occurs because the function indiscriminately sets the
.tcs_*_old/new_table flags regardless of cmdType. The inconsistency
results in a crash during ExecCrossPartitionUpdate, where ExecDelete
is invoked as part of an UPDATE command.
The crash appears to have been accidentally fixed in commit
3a46a45f6f0 by simply skipping tuple storage when a tuplestore is not
available (in TransitionTableAddTuple). However, I believe that
MakeTransitionCaptureState returning an inconsistent
TransitionCaptureState is fundamentally problematic.
The attached first patch adds a test case that causes the crash in
versions 11 through 14 but executes successfully in version 15 and
later.
The second patch provides a fix for versions 11 to 14, and the third
patch applies the same fix to version 15 and later, purely for
consistency with the first patch. These two patches are designed so
that the differences between the code after applying each of them
remain small.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-Add-test-for-a-crash-bug.patchtext/x-patch; charset=us-asciiDownload
From 8322ba28b46f5d51aee0516ec61f33f2f695f4c2 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 4 Feb 2025 15:14:27 +0900
Subject: [PATCH 1/2] Add test for a crash bug
---
src/test/regress/expected/triggers.out | 48 +++++++++++++++++++++++++
src/test/regress/sql/triggers.sql | 50 ++++++++++++++++++++++++++
2 files changed, 98 insertions(+)
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index d3e02ca63b3..5aed966eac3 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2957,6 +2957,54 @@ drop trigger child_row_trig on child;
alter table parent attach partition child for values in ('AAA');
drop table child, parent;
--
+-- Verify that UPDATE triggers do not access incorrect transition
+-- tables (which leads to a crash) due to other types of triggers.
+--
+create or replace function dump_update_new() returns trigger language plpgsql as
+$$
+ begin
+ raise notice 'trigger = %, new table = %',
+ TG_NAME,
+ (select string_agg(new_table::text, ', ' order by a) from new_table);
+ return null;
+ end;
+$$;
+create or replace function dump_update_old() returns trigger language plpgsql as
+$$
+ begin
+ raise notice 'trigger = %, old table = %',
+ TG_NAME,
+ (select string_agg(old_table::text, ', ' order by a) from old_table);
+ return null;
+ end;
+$$;
+create table parent (a text) partition by list (a);
+create table child1 partition of parent for values in ('AAA');
+create table child2 partition of parent for values in ('BBB');
+create trigger parent_update_trig
+ after update on parent referencing old table as old_table
+ for each statement execute procedure dump_update_old();
+create trigger parent_insert_trig
+ after insert on parent referencing new table as new_table
+ for each statement execute procedure dump_insert();
+create trigger parent_delete_trig
+ after delete on parent referencing old table as old_table
+ for each statement execute procedure dump_delete();
+insert into parent values ('AAA');
+NOTICE: trigger = parent_insert_trig, new table = (AAA)
+update parent set a = 'BBB' where a = 'AAA'; -- should not trigger access to new
+NOTICE: trigger = parent_update_trig, old table = (AAA)
+drop trigger parent_update_trig on parent;
+create trigger parent_update_trig
+ after update on parent referencing new table as new_table
+ for each statement execute procedure dump_update_new();
+update parent set a = 'AAA' where a = 'BBB'; -- should not trigger access to old
+NOTICE: trigger = parent_update_trig, new table = (AAA)
+delete from parent;
+NOTICE: trigger = parent_delete_trig, old table = (AAA)
+drop table parent, child1, child2;
+drop function dump_update_new, dump_update_old;
+--
-- Verify behavior of statement triggers on (non-partition)
-- inheritance hierarchy with transition tables; similar to the
-- partition case, except there is no rerouting on insertion and child
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index 4cc096265db..3f3e26ee1ee 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -2118,6 +2118,56 @@ alter table parent attach partition child for values in ('AAA');
drop table child, parent;
+--
+-- Verify that UPDATE triggers do not access incorrect transition
+-- tables (which leads to a crash) due to other types of triggers.
+--
+create or replace function dump_update_new() returns trigger language plpgsql as
+$$
+ begin
+ raise notice 'trigger = %, new table = %',
+ TG_NAME,
+ (select string_agg(new_table::text, ', ' order by a) from new_table);
+ return null;
+ end;
+$$;
+
+create or replace function dump_update_old() returns trigger language plpgsql as
+$$
+ begin
+ raise notice 'trigger = %, old table = %',
+ TG_NAME,
+ (select string_agg(old_table::text, ', ' order by a) from old_table);
+ return null;
+ end;
+$$;
+
+create table parent (a text) partition by list (a);
+create table child1 partition of parent for values in ('AAA');
+create table child2 partition of parent for values in ('BBB');
+create trigger parent_update_trig
+ after update on parent referencing old table as old_table
+ for each statement execute procedure dump_update_old();
+create trigger parent_insert_trig
+ after insert on parent referencing new table as new_table
+ for each statement execute procedure dump_insert();
+create trigger parent_delete_trig
+ after delete on parent referencing old table as old_table
+ for each statement execute procedure dump_delete();
+
+insert into parent values ('AAA');
+update parent set a = 'BBB' where a = 'AAA'; -- should not trigger access to new
+
+drop trigger parent_update_trig on parent;
+create trigger parent_update_trig
+ after update on parent referencing new table as new_table
+ for each statement execute procedure dump_update_new();
+update parent set a = 'AAA' where a = 'BBB'; -- should not trigger access to old
+delete from parent;
+
+drop table parent, child1, child2;
+drop function dump_update_new, dump_update_old;
+
--
-- Verify behavior of statement triggers on (non-partition)
-- inheritance hierarchy with transition tables; similar to the
--
2.43.5
0002-Fix-MakeTransitionCaptureState-to-return-a-consisten_11-14.patchtext/x-patch; charset=us-asciiDownload
From 5c0ddab6b640518d65b52fca40154126c58925a6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 7 Feb 2025 14:12:18 +0900
Subject: [PATCH 2/2] Fix MakeTransitionCaptureState to return a consistent
result
When both an UPDATE trigger referencing NEW TABLE and a DELETE trigger
referencing OLD TABLE are present, the function returns an
inconsistent result for UPDATE command between reference flags and
tuplestores. This causes a crash in version 14 and earlier during
cross-partition UPDATEs on a partitioned table.
This commit fixes the function so that it returns a consistent state
by incorporating part of the changes made in commit 7103ebb7aae in
version 15 as part of its own fix, thereby reducing the code
differences with version 15 and later versions.
---
src/backend/commands/trigger.c | 36 +++++++++++++++++++---------------
1 file changed, 20 insertions(+), 16 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index d04b4beed91..10cfd59aa52 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -4511,8 +4511,10 @@ TransitionCaptureState *
MakeTransitionCaptureState(TriggerDesc *trigdesc, Oid relid, CmdType cmdType)
{
TransitionCaptureState *state;
- bool need_old,
- need_new;
+ bool need_old_upd,
+ need_new_upd,
+ need_old_del,
+ need_new_ins;
AfterTriggersTableData *table;
MemoryContext oldcxt;
ResourceOwner saveResourceOwner;
@@ -4524,23 +4526,25 @@ MakeTransitionCaptureState(TriggerDesc *trigdesc, Oid relid, CmdType cmdType)
switch (cmdType)
{
case CMD_INSERT:
- need_old = false;
- need_new = trigdesc->trig_insert_new_table;
+ need_old_upd = need_old_del = need_new_upd = false;
+ need_new_ins = trigdesc->trig_insert_new_table;
break;
case CMD_UPDATE:
- need_old = trigdesc->trig_update_old_table;
- need_new = trigdesc->trig_update_new_table;
+ need_old_upd = trigdesc->trig_update_old_table;
+ need_new_upd = trigdesc->trig_update_new_table;
+ need_old_del = need_new_ins = false;
break;
case CMD_DELETE:
- need_old = trigdesc->trig_delete_old_table;
- need_new = false;
+ need_old_del = trigdesc->trig_delete_old_table;
+ need_old_upd = need_new_upd = need_new_ins = false;
break;
default:
elog(ERROR, "unexpected CmdType: %d", (int) cmdType);
- need_old = need_new = false; /* keep compiler quiet */
+ /* keep compiler quiet */
+ need_old_upd = need_new_upd = need_old_del = need_new_ins = false;
break;
}
- if (!need_old && !need_new)
+ if (!need_old_upd && !need_new_upd && !need_new_ins && !need_old_del)
return NULL;
/* Check state, like AfterTriggerSaveEvent. */
@@ -4570,9 +4574,9 @@ MakeTransitionCaptureState(TriggerDesc *trigdesc, Oid relid, CmdType cmdType)
saveResourceOwner = CurrentResourceOwner;
CurrentResourceOwner = CurTransactionResourceOwner;
- if (need_old && table->old_tuplestore == NULL)
+ if ((need_old_upd || need_old_del) && table->old_tuplestore == NULL)
table->old_tuplestore = tuplestore_begin_heap(false, false, work_mem);
- if (need_new && table->new_tuplestore == NULL)
+ if ((need_new_upd || need_new_ins) && table->new_tuplestore == NULL)
table->new_tuplestore = tuplestore_begin_heap(false, false, work_mem);
CurrentResourceOwner = saveResourceOwner;
@@ -4580,10 +4584,10 @@ MakeTransitionCaptureState(TriggerDesc *trigdesc, Oid relid, CmdType cmdType)
/* Now build the TransitionCaptureState struct, in caller's context */
state = (TransitionCaptureState *) palloc0(sizeof(TransitionCaptureState));
- state->tcs_delete_old_table = trigdesc->trig_delete_old_table;
- state->tcs_update_old_table = trigdesc->trig_update_old_table;
- state->tcs_update_new_table = trigdesc->trig_update_new_table;
- state->tcs_insert_new_table = trigdesc->trig_insert_new_table;
+ state->tcs_delete_old_table = need_old_del;
+ state->tcs_update_old_table = need_old_upd;
+ state->tcs_update_new_table = need_new_upd;
+ state->tcs_insert_new_table = need_new_ins;
state->tcs_private = table;
return state;
--
2.43.5
0002-Fix-MakeTransitionCaptureState-to-return-a-consisten_15-master.patchtext/x-patch; charset=us-asciiDownload
From c77c9387b77f700bede9d5cc949045ecd1e2a8d0 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 7 Feb 2025 13:55:52 +0900
Subject: [PATCH 2/2] Fix MakeTransitionCaptureState to return a consistent
result
When both an UPDATE trigger referencing NEW TABLE and a DELETE trigger
referencing OLD TABLE are present, the function returns an
inconsistent result for UPDATE command between reference flags and
tuplestores. This caused a crash in version 14 and earlier during
cross-partition UPDATEs on a partitioned table, but commit 3a46a45f6f0
accidentally fixed the issue.
Still, it is better to fix this inconsistency, and since similar fixes
have been made in older versions, we should apply the same fix in this
version to keep the code consistent across major versions.
---
src/backend/commands/trigger.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 7a5ffe32f60..05b31bcb30a 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -4991,10 +4991,10 @@ MakeTransitionCaptureState(TriggerDesc *trigdesc, Oid relid, CmdType cmdType)
/* Now build the TransitionCaptureState struct, in caller's context */
state = (TransitionCaptureState *) palloc0(sizeof(TransitionCaptureState));
- state->tcs_delete_old_table = trigdesc->trig_delete_old_table;
- state->tcs_update_old_table = trigdesc->trig_update_old_table;
- state->tcs_update_new_table = trigdesc->trig_update_new_table;
- state->tcs_insert_new_table = trigdesc->trig_insert_new_table;
+ state->tcs_delete_old_table = need_old_del;
+ state->tcs_update_old_table = need_old_upd;
+ state->tcs_update_new_table = need_new_upd;
+ state->tcs_insert_new_table = need_new_ins;
state->tcs_private = table;
return state;
--
2.43.5
On Fri, Feb 07, 2025 at 03:02:38PM +0900, Kyotaro Horiguchi wrote:
We received a crash report related to cross-partition updates
involving multiple triggers.After investigating the situation, we found that the crash occurs by
the following steps in PostgreSQL versions 11 through 14 (We have not
checked versions 10 and earlier.):create table tp(a int) partition by range (a);
create table tc1 partition of tp for values from (0) to (10);
create table tc2 partition of tp for values from (10) to (20);
insert into tp values (1);
create or replace function trg() returns trigger as $$BEGIN RETURN NULL; END; $$ language plpgsql;
create trigger td after delete on tp referencing old table old for each statement execute function trg();
create trigger tu after update on tp referencing new table new for each statement execute function trg();update tp set a = 11 where a = 1;
<crash>
Just to answer your question, on v10 it fails differently (also
replace "function" by "procedure" in the CREATE TRIGGER queries):
ERROR: 23514: new row for relation "tc1" violates partition constraint
DETAIL: Failing row contains (11).
Anyway, support for triggers on partitioned tables has been added in
v11, so this has never really worked on this branch. :D
The crash appears to have been accidentally fixed in commit
3a46a45f6f0 by simply skipping tuple storage when a tuplestore is not
available (in TransitionTableAddTuple). However, I believe that
MakeTransitionCaptureState returning an inconsistent
TransitionCaptureState is fundamentally problematic.
Hmm.. Agreed, it seems that you are right in the way of taking care
of this inconsistency. That's interesting. I would need to look at
that more closely with a couple of hours head down. It's a bit late
in the week here so that's not going to happen today. Note that we
have a release coming next week and all stable branches should not be
touched, but perhaps somebody will be able to beat me here.
The attached first patch adds a test case that causes the crash in
versions 11 through 14 but executes successfully in version 15 and
later.
Yeah, agreed to add these tests on all the branches, even 15~.
--
Michael
On Fri, Feb 07, 2025 at 03:47:02PM +0900, Michael Paquier wrote:
Hmm.. Agreed, it seems that you are right in the way of taking care
of this inconsistency. That's interesting. I would need to look at
that more closely with a couple of hours head down. It's a bit late
in the week here so that's not going to happen today. Note that we
have a release coming next week and all stable branches should not be
touched, but perhaps somebody will be able to beat me here.
Put my head down for a couple of hours on this one to untangle these
bits of states with the old and new transition tables, and your
suggestions look right, including the tweaks in 15~HEAD for the flag
values to keep the branches a bit more consistent.
One thing that itched me a bit was in the test, where I found a bit
more useful to have two values in the partitions on top of the value
moved to a different partition, making the dump of the old and new
transition tables a bit more verbose. A nit from me, in this case
(the crash still reproduces with the committed test, of course).
Nice investigation from you, thanks! Applied down to v13.
--
Michael