[BUG] segfault during delete
Hi hackers,
Here is a scenario that segfault during delete (with version >= 12):
create table parent (
col1 text primary key
);
create table child (
col1 text primary key,
FOREIGN KEY (col1) REFERENCES parent(col1) on delete cascade
);
CREATE or replace FUNCTION trigger_function()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
BEGIN
raise notice 'trigger = %, old table = %',
TG_NAME,
(select string_agg(old_table::text, ', ' order by col1) from
old_table);
return NULL;
END;
$$
;
create trigger bdt_trigger after delete on child REFERENCING OLD TABLE
AS old_table for each statement EXECUTE function trigger_function();
alter table child add column col2 text not null default 'tutu';
insert into parent(col1) values ('1');
insert into child(col1) values ('1');
insert into parent(col1) values ('2');
insert into child(col1) values ('2');
delete from parent;
produces:
CREATE TABLE
CREATE TABLE
CREATE FUNCTION
CREATE TRIGGER
ALTER TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
psql:./segfault.repro.sql:35: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:./segfault.repro.sql:35: fatal: connection to server was lost
The column being added to the child relation needs to have a default
value for the segfault to be triggered.
The stack is the following:
#0 0x000000000049aa59 in execute_attr_map_slot (attrMap=0x11736c8,
in_slot=0x1179c90, out_slot=0x1179da8) at tupconvert.c:193
#1 0x0000000000700ec0 in AfterTriggerSaveEvent (estate=0x1171820,
relinfo=0x1171cc8, event=1, row_trigger=true, oldslot=0x1179c90,
newslot=0x0, recheckIndexes=0x0, modifiedCols=0x0,
transition_capture=0x1172438) at trigger.c:5488
#2 0x00000000006fc6a8 in ExecARDeleteTriggers (estate=0x1171820,
relinfo=0x1171cc8, tupleid=0x7ffd2b7f2e40, fdw_trigtuple=0x0,
transition_capture=0x1172438) at trigger.c:2565
#3 0x0000000000770794 in ExecDelete (mtstate=0x1171a90,
resultRelInfo=0x1171cc8, tupleid=0x7ffd2b7f2e40, oldtuple=0x0,
planSlot=0x1173630, epqstate=0x1171b88, estate=0x1171820,
processReturning=true, canSetTag=true, changingPart=false,
tupleDeleted=0x0, epqreturnslot=0x0)
at nodeModifyTable.c:1128
#4 0x00000000007724cc in ExecModifyTable (pstate=0x1171a90) at
nodeModifyTable.c:2259
I had a look at it, and it looks to me that the slot being reused in
AfterTriggerSaveEvent() (when (map != NULL) and not (!storeslot)) has
previously been damaged by FreeExecutorState() in standard_ExecutorEnd().
Then I ended up with the enclosed patch proposal that does not make the
repro segfaulting and that is passing make check too.
Not sure that what is being done in the patch is the right/best approach
to solve the issue though.
Bertrand
PS: the same repro with a foreign key with update cascade, an update
trigger and an update on the col1 column would segfault too (but does
not with the enclosed patch proposal).
Attachments:
v1-0001-segfault-after-trigger.patchtext/plain; charset=UTF-8; name=v1-0001-segfault-after-trigger.patch; x-mac-creator=0; x-mac-type=0Download+52-0
Hi Bertrand,
On Wed, Feb 24, 2021 at 5:56 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Hi hackers,
Here is a scenario that segfault during delete (with version >= 12):
create table parent (
col1 text primary key
);create table child (
col1 text primary key,
FOREIGN KEY (col1) REFERENCES parent(col1) on delete cascade
);CREATE or replace FUNCTION trigger_function()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
BEGIN
raise notice 'trigger = %, old table = %',
TG_NAME,
(select string_agg(old_table::text, ', ' order by col1) from old_table);
return NULL;
END;
$$
;create trigger bdt_trigger after delete on child REFERENCING OLD TABLE AS old_table for each statement EXECUTE function trigger_function();
alter table child add column col2 text not null default 'tutu';
insert into parent(col1) values ('1');
insert into child(col1) values ('1');
insert into parent(col1) values ('2');
insert into child(col1) values ('2');
delete from parent;produces:
CREATE TABLE
CREATE TABLE
CREATE FUNCTION
CREATE TRIGGER
ALTER TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
psql:./segfault.repro.sql:35: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:./segfault.repro.sql:35: fatal: connection to server was lostThe column being added to the child relation needs to have a default value for the segfault to be triggered.
The stack is the following:
#0 0x000000000049aa59 in execute_attr_map_slot (attrMap=0x11736c8, in_slot=0x1179c90, out_slot=0x1179da8) at tupconvert.c:193
#1 0x0000000000700ec0 in AfterTriggerSaveEvent (estate=0x1171820, relinfo=0x1171cc8, event=1, row_trigger=true, oldslot=0x1179c90, newslot=0x0, recheckIndexes=0x0, modifiedCols=0x0, transition_capture=0x1172438) at trigger.c:5488
#2 0x00000000006fc6a8 in ExecARDeleteTriggers (estate=0x1171820, relinfo=0x1171cc8, tupleid=0x7ffd2b7f2e40, fdw_trigtuple=0x0, transition_capture=0x1172438) at trigger.c:2565
#3 0x0000000000770794 in ExecDelete (mtstate=0x1171a90, resultRelInfo=0x1171cc8, tupleid=0x7ffd2b7f2e40, oldtuple=0x0, planSlot=0x1173630, epqstate=0x1171b88, estate=0x1171820, processReturning=true, canSetTag=true, changingPart=false, tupleDeleted=0x0, epqreturnslot=0x0)
at nodeModifyTable.c:1128
#4 0x00000000007724cc in ExecModifyTable (pstate=0x1171a90) at nodeModifyTable.c:2259I had a look at it, and it looks to me that the slot being reused in AfterTriggerSaveEvent() (when (map != NULL) and not (!storeslot)) has previously been damaged by FreeExecutorState() in standard_ExecutorEnd().
Right.
Then I ended up with the enclosed patch proposal that does not make the repro segfaulting and that is passing make check too.
Not sure that what is being done in the patch is the right/best approach to solve the issue though.
Hmm, I don't think we should be *freshly* allocating the
TupleTableSlot every time. Not having to do that is why the commit
ff11e7f4b9ae0 seems to have invented AfterTriggersTableData.storeslot
in the first place, that is, to cache the slot once created.
Problem with the current way as you've discovered is that the slot
gets allocated in the execution-span memory context, whereas the
AfterTriggersTableData instance, of which the slot is a part, is
supposed to last the entire (sub-)transaction. So the correct fix I
think is to allocate the slot to be stored in
AfterTriggersTableData.storeslot in the transaction-span memory
context as well.
Having taken care of that, another problem with the current way is
that it adds the slot to es_tupleTable by calling
ExecAllocTupleSlot(), which means that the slot will be released when
ExecResetTupleTable() is called on es_tupleTable as part of
ExecEndPlan(). That would defeat the point of allocating the slot in
the transaction-span context. So let's use MakeSingleTableSlot() to
allocate a standalone slot to be stored in
AfterTriggersTableData.storeslot and have AfterTriggerFreeQuery() call
ExecDropSingleTupleTableSlot() to release it.
PS: the same repro with a foreign key with update cascade, an update trigger and an update on the col1 column would segfault too (but does not with the enclosed patch proposal).
Actually, we also need to fix similar code in the block for populating
the transition NEW TABLE, because without doing so that block too can
crash similarly:
if (!TupIsNull(newslot) &&
((event == TRIGGER_EVENT_INSERT && insert_new_table) ||
(event == TRIGGER_EVENT_UPDATE && update_new_table)))
I've attached a patch with my suggested fixes and also test cases.
Please take a look.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v2-0001-Fix-use-after-tree-bug-with-AfterTriggersTableDat.patchapplication/octet-stream; name=v2-0001-Fix-use-after-tree-bug-with-AfterTriggersTableDat.patchDownload+162-20
Hi Amit,
On 2/24/21 9:12 AM, Amit Langote wrote:
Hi Bertrand,
On Wed, Feb 24, 2021 at 5:56 AM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Hi hackers,
Here is a scenario that segfault during delete (with version >= 12):
create table parent (
col1 text primary key
);create table child (
col1 text primary key,
FOREIGN KEY (col1) REFERENCES parent(col1) on delete cascade
);CREATE or replace FUNCTION trigger_function()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
BEGIN
raise notice 'trigger = %, old table = %',
TG_NAME,
(select string_agg(old_table::text, ', ' order by col1) from old_table);
return NULL;
END;
$$
;create trigger bdt_trigger after delete on child REFERENCING OLD TABLE AS old_table for each statement EXECUTE function trigger_function();
alter table child add column col2 text not null default 'tutu';
insert into parent(col1) values ('1');
insert into child(col1) values ('1');
insert into parent(col1) values ('2');
insert into child(col1) values ('2');
delete from parent;produces:
CREATE TABLE
CREATE TABLE
CREATE FUNCTION
CREATE TRIGGER
ALTER TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
psql:./segfault.repro.sql:35: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:./segfault.repro.sql:35: fatal: connection to server was lostThe column being added to the child relation needs to have a default value for the segfault to be triggered.
The stack is the following:
#0 0x000000000049aa59 in execute_attr_map_slot (attrMap=0x11736c8, in_slot=0x1179c90, out_slot=0x1179da8) at tupconvert.c:193
#1 0x0000000000700ec0 in AfterTriggerSaveEvent (estate=0x1171820, relinfo=0x1171cc8, event=1, row_trigger=true, oldslot=0x1179c90, newslot=0x0, recheckIndexes=0x0, modifiedCols=0x0, transition_capture=0x1172438) at trigger.c:5488
#2 0x00000000006fc6a8 in ExecARDeleteTriggers (estate=0x1171820, relinfo=0x1171cc8, tupleid=0x7ffd2b7f2e40, fdw_trigtuple=0x0, transition_capture=0x1172438) at trigger.c:2565
#3 0x0000000000770794 in ExecDelete (mtstate=0x1171a90, resultRelInfo=0x1171cc8, tupleid=0x7ffd2b7f2e40, oldtuple=0x0, planSlot=0x1173630, epqstate=0x1171b88, estate=0x1171820, processReturning=true, canSetTag=true, changingPart=false, tupleDeleted=0x0, epqreturnslot=0x0)
at nodeModifyTable.c:1128
#4 0x00000000007724cc in ExecModifyTable (pstate=0x1171a90) at nodeModifyTable.c:2259I had a look at it, and it looks to me that the slot being reused in AfterTriggerSaveEvent() (when (map != NULL) and not (!storeslot)) has previously been damaged by FreeExecutorState() in standard_ExecutorEnd().
Right.
Thanks for reviewing the issue and the patch!
Then I ended up with the enclosed patch proposal that does not make the repro segfaulting and that is passing make check too.
Not sure that what is being done in the patch is the right/best approach to solve the issue though.
Hmm, I don't think we should be *freshly* allocating the
TupleTableSlot every time. Not having to do that is why the commit
ff11e7f4b9ae0 seems to have invented AfterTriggersTableData.storeslot
in the first place, that is, to cache the slot once created.
Oh, ok thanks for the explanation.
Problem with the current way as you've discovered is that the slot
gets allocated in the execution-span memory context, whereas the
AfterTriggersTableData instance, of which the slot is a part, is
supposed to last the entire (sub-)transaction.
Right.
So the correct fix I
think is to allocate the slot to be stored in
AfterTriggersTableData.storeslot in the transaction-span memory
context as well.
Right, that makes more sense that what my patch was doing.
Having taken care of that, another problem with the current way is
that it adds the slot to es_tupleTable by calling
ExecAllocTupleSlot(), which means that the slot will be released when
ExecResetTupleTable() is called on es_tupleTable as part of
ExecEndPlan(). That would defeat the point of allocating the slot in
the transaction-span context. So let's use MakeSingleTableSlot() to
allocate a standalone slot to be stored in
AfterTriggersTableData.storeslot and have AfterTriggerFreeQuery() call
ExecDropSingleTupleTableSlot() to release it.
+1
PS: the same repro with a foreign key with update cascade, an update trigger and an update on the col1 column would segfault too (but does not with the enclosed patch proposal).
Actually, we also need to fix similar code in the block for populating
the transition NEW TABLE, because without doing so that block too can
crash similarly:if (!TupIsNull(newslot) &&
((event == TRIGGER_EVENT_INSERT && insert_new_table) ||
(event == TRIGGER_EVENT_UPDATE && update_new_table)))I've attached a patch with my suggested fixes and also test cases.
Please take a look.
I had a look and it looks good to me. Also the new regression tests are
doing it right and are segfaulting without your patch.
That's all good to me.
That fix should be back ported to 12 and 13, do you agree?
Thanks a lot for your help and explanations!
Bertrand
On Wed, Feb 24, 2021 at 6:12 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
On 2/24/21 9:12 AM, Amit Langote wrote:
I've attached a patch with my suggested fixes and also test cases.
Please take a look.I had a look and it looks good to me. Also the new regression tests are
doing it right and are segfaulting without your patch.That's all good to me.
Thanks.
That fix should be back ported to 12 and 13, do you agree?
Yeah, I think so.
--
Amit Langote
EDB: http://www.enterprisedb.com
On 2/24/21 1:59 PM, Amit Langote wrote:
On Wed, Feb 24, 2021 at 6:12 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
On 2/24/21 9:12 AM, Amit Langote wrote:
I've attached a patch with my suggested fixes and also test cases.
Please take a look.I had a look and it looks good to me. Also the new regression tests are
doing it right and are segfaulting without your patch.That's all good to me.
Thanks.
That fix should be back ported to 12 and 13, do you agree?
Yeah, I think so.
great, patch added to the CF.
Bertrand
Thanks Amit for working on this fix! It seems correct to me, so I pushed it with trivial changes. Thanks Bertrand for reporting the problem.
In addition to backpatching the code fix to pg12, I backpatched the test case to pg11. It worked fine for me (with no code changes), but it seems good to have it there just to make sure the buildfarm agrees with us on this.
On Sun, Feb 28, 2021 at 6:14 AM Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Thanks Amit for working on this fix! It seems correct to me, so I pushed it with trivial changes. Thanks Bertrand for reporting the problem.
Thanks Alvaro.
In addition to backpatching the code fix to pg12, I backpatched the test case to pg11. It worked fine for me (with no code changes), but it seems good to have it there just to make sure the buildfarm agrees with us on this.
Ah, good call.
--
Amit Langote
EDB: http://www.enterprisedb.com
Thanks Amit and Alvaro for having helped to fix this bug so quickly.
Bertrand
Show quoted text
On 3/1/21 3:22 AM, Amit Langote wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
On Sun, Feb 28, 2021 at 6:14 AM Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Thanks Amit for working on this fix! It seems correct to me, so I pushed it with trivial changes. Thanks Bertrand for reporting the problem.
Thanks Alvaro.
In addition to backpatching the code fix to pg12, I backpatched the test case to pg11. It worked fine for me (with no code changes), but it seems good to have it there just to make sure the buildfarm agrees with us on this.
Ah, good call.
--
Amit Langote
EDB: http://www.enterprisedb.com