Fix for a crash caused by triggers in cross-partition updates
Hello.
We received a crash report related to cross-partition updates
involving multiple triggers.
After investigating the situation, we found that the crash occurs by
the following steps in PostgreSQL versions 11 through 14 (We have not
checked versions 10 and earlier.):
create table tp(a int) partition by range (a);
create table tc1 partition of tp for values from (0) to (10);
create table tc2 partition of tp for values from (10) to (20);
insert into tp values (1);
create or replace function trg() returns trigger as $$BEGIN RETURN NULL; END; $$ language plpgsql;
create trigger td after delete on tp referencing old table old for each statement execute function trg();
create trigger tu after update on tp referencing new table new for each statement execute function trg();
update tp set a = 11 where a = 1;
<crash>
In this scenario, AfterTriggerSaveEvent is passed an inconsistent
TransitionCaptureState. Specifically, it indicates that
delete_old_table is required, but the corresponding tuplestore
(old_tuplestore) is not actually provided, leading to a crash. This
issue occurs because the function indiscriminately sets the
.tcs_*_old/new_table flags regardless of cmdType. The inconsistency
results in a crash during ExecCrossPartitionUpdate, where ExecDelete
is invoked as part of an UPDATE command.
The crash appears to have been accidentally fixed in commit
3a46a45f6f0 by simply skipping tuple storage when a tuplestore is not
available (in TransitionTableAddTuple). However, I believe that
MakeTransitionCaptureState returning an inconsistent
TransitionCaptureState is fundamentally problematic.
The attached first patch adds a test case that causes the crash in
versions 11 through 14 but executes successfully in version 15 and
later.
The second patch provides a fix for versions 11 to 14, and the third
patch applies the same fix to version 15 and later, purely for
consistency with the first patch. These two patches are designed so
that the differences between the code after applying each of them
remain small.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-Add-test-for-a-crash-bug.patchtext/x-patch; charset=us-asciiDownload+98-1
0002-Fix-MakeTransitionCaptureState-to-return-a-consisten_11-14.patchtext/x-patch; charset=us-asciiDownload+20-17
0002-Fix-MakeTransitionCaptureState-to-return-a-consisten_15-master.patchtext/x-patch; charset=us-asciiDownload+4-5
On Fri, Feb 07, 2025 at 03:02:38PM +0900, Kyotaro Horiguchi wrote:
We received a crash report related to cross-partition updates
involving multiple triggers.After investigating the situation, we found that the crash occurs by
the following steps in PostgreSQL versions 11 through 14 (We have not
checked versions 10 and earlier.):create table tp(a int) partition by range (a);
create table tc1 partition of tp for values from (0) to (10);
create table tc2 partition of tp for values from (10) to (20);
insert into tp values (1);
create or replace function trg() returns trigger as $$BEGIN RETURN NULL; END; $$ language plpgsql;
create trigger td after delete on tp referencing old table old for each statement execute function trg();
create trigger tu after update on tp referencing new table new for each statement execute function trg();update tp set a = 11 where a = 1;
<crash>
Just to answer your question, on v10 it fails differently (also
replace "function" by "procedure" in the CREATE TRIGGER queries):
ERROR: 23514: new row for relation "tc1" violates partition constraint
DETAIL: Failing row contains (11).
Anyway, support for triggers on partitioned tables has been added in
v11, so this has never really worked on this branch. :D
The crash appears to have been accidentally fixed in commit
3a46a45f6f0 by simply skipping tuple storage when a tuplestore is not
available (in TransitionTableAddTuple). However, I believe that
MakeTransitionCaptureState returning an inconsistent
TransitionCaptureState is fundamentally problematic.
Hmm.. Agreed, it seems that you are right in the way of taking care
of this inconsistency. That's interesting. I would need to look at
that more closely with a couple of hours head down. It's a bit late
in the week here so that's not going to happen today. Note that we
have a release coming next week and all stable branches should not be
touched, but perhaps somebody will be able to beat me here.
The attached first patch adds a test case that causes the crash in
versions 11 through 14 but executes successfully in version 15 and
later.
Yeah, agreed to add these tests on all the branches, even 15~.
--
Michael
On Fri, Feb 07, 2025 at 03:47:02PM +0900, Michael Paquier wrote:
Hmm.. Agreed, it seems that you are right in the way of taking care
of this inconsistency. That's interesting. I would need to look at
that more closely with a couple of hours head down. It's a bit late
in the week here so that's not going to happen today. Note that we
have a release coming next week and all stable branches should not be
touched, but perhaps somebody will be able to beat me here.
Put my head down for a couple of hours on this one to untangle these
bits of states with the old and new transition tables, and your
suggestions look right, including the tweaks in 15~HEAD for the flag
values to keep the branches a bit more consistent.
One thing that itched me a bit was in the test, where I found a bit
more useful to have two values in the partitions on top of the value
moved to a different partition, making the dump of the old and new
transition tables a bit more verbose. A nit from me, in this case
(the crash still reproduces with the committed test, of course).
Nice investigation from you, thanks! Applied down to v13.
--
Michael