BUG #18830: ExecInitMerge Segfault on MERGE
The following bug has been logged on the website:
Bug reference: 18830
Logged by: Robins Tharakan
Email address: tharakan@gmail.com
PostgreSQL version: Unsupported/Unknown
Operating system: Ubuntu
Description:
With this SQL (backtraces below), postgres starts to segfault starting at
cbc127917e.
SQL
===
CREATE AGGREGATE d(double precision ORDER BY anyelement) (
SFUNC = ordered_set_transition_multi,
STYPE = internal,
FINALFUNC = rank_final);
CREATE TABLE e (
a text,
b integer
) PARTITION BY LIST (a);
CREATE TABLE f (
a text,
b integer
);
CREATE TABLE g (
a text,
b integer
);
CREATE VIEW h AS
SELECT XMLSERIALIZE(DOCUMENT '<foo><bar>42</bar></foo>'AS text );
ALTER TABLE e ATTACH PARTITION f FOR VALUES IN ('a');
ALTER TABLE e ATTACH PARTITION g FOR VALUES IN ('b');
MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));
SQL Output
==========
$ psql -p 9999 postgres -f ../sqith/repro_final.sql
CREATE AGGREGATE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE VIEW
ALTER TABLE
ALTER TABLE
psql:../sqith/repro_final.sql:25: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:../sqith/repro_final.sql:25: error: connection to server was lost
Error Log
=========
2025-03-03 06:55:54.014 ACDT [137463] LOG: client backend (PID 137479) was
terminated by signal 11: Segmentation fault
2025-03-03 06:55:54.014 ACDT [137463] DETAIL: Failed process was running:
MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));
2025-03-03 06:55:54.014 ACDT [137463] LOG: terminating any other active
server processes
Commit
======
Checking (15a79c73111~0) - 15a79c7311 - fail
Checking (15a79c73111~10) - 424ededc58 - fail
Checking (15a79c73111~30) - 945a9e3832 - fail
Checking (15a79c73111~70) - a4e986ef5a - fail
Checking (15a79c73111~150) - 6a2275b895 - fail
Checking (15a79c73111~310) - 117f9f328e - Pass
Checking (15a79c73111~230) - c89525d57b - Pass
Checking (15a79c73111~190) - c366d2bdba - fail
Checking (15a79c73111~210) - fb056564ec - fail
Checking (15a79c73111~220) - 44ec095751 - Pass
Checking (15a79c73111~215) - cbc127917e - fail
Checking (15a79c73111~217) - 428fadb7e9 - Pass
Checking (15a79c73111~216) - 926c7fce03 - Pass
Offending Commit is cbc127917e - HEAD~215
BackTrace
=========
(gdb) bt
#0 ExecInitMerge (mtstate=0x599c6b260330, estate=0x599c6b260080) at
nodeModifyTable.c:3663
#1 0x0000599c2ca1ce11 in ExecInitModifyTable (node=0x599c6b282e08,
estate=0x599c6b260080, eflags=0) at nodeModifyTable.c:4889
#2 0x0000599c2c9d63a3 in ExecInitNode (node=0x599c6b282e08,
estate=0x599c6b260080, eflags=0) at execProcnode.c:177
#3 0x0000599c2c9cac66 in InitPlan (queryDesc=0x599c6b218ba0, eflags=0) at
execMain.c:985
#4 0x0000599c2c9c99f0 in standard_ExecutorStart (queryDesc=0x599c6b218ba0,
eflags=0) at execMain.c:259
#5 0x0000599c2c9c96fb in ExecutorStart (queryDesc=0x599c6b218ba0, eflags=0)
at execMain.c:135
#6 0x0000599c2ccb3925 in ProcessQuery (plan=0x599c6b284188,
sourceText=0x599c6b16c4e0 "MERGE INTO e USING h ON a = xmlserialize
WHEN NOT MATCHED THEN INSERT VALUES (CAST(NULL AS text));", params=0x0,
queryEnv=0x0, dest=0x599c6b284308,
qc=0x7ffc8bc1aaf0) at pquery.c:155
#7 0x0000599c2ccb546e in PortalRunMulti (portal=0x599c6b1ef0a0,
isTopLevel=true, setHoldSnapshot=false, dest=0x599c6b284308,
altdest=0x599c6b284308, qc=0x7ffc8bc1aaf0)
at pquery.c:1271
#8 0x0000599c2ccb497a in PortalRun (portal=0x599c6b1ef0a0,
count=9223372036854775807, isTopLevel=true, dest=0x599c6b284308,
altdest=0x599c6b284308, qc=0x7ffc8bc1aaf0) at pquery.c:787
#9 0x0000599c2ccad440 in exec_simple_query (query_string=0x599c6b16c4e0
"MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));")
at postgres.c:1271
#10 0x0000599c2ccb27ef in PostgresMain (dbname=0x599c6b1a6680 "smithreduce",
username=0x599c6b1a6668 "smith") at postgres.c:4691
BackTrace Full
==============
#0 ExecInitMerge (mtstate=0x599c6b260330, estate=0x599c6b260080) at
nodeModifyTable.c:3663
mergeActionList = 0x599c6b281b70
joinCondition = 0x0
relationDesc = 0x599c2c9df7cb <ExecTypeFromTL+33>
l = 0x100000000
lc__state = {l = 0x599c6b281bc0, i = 0}
node = 0x599c6b282e08
rootRelInfo = 0x599c6b260f98
resultRelInfo = 0x599c6b260f80
econtext = 0x599c6b2695f8
lc = 0x599c6b281bd8
i = 1
__func__ = "ExecInitMerge"
#1 0x0000599c2ca1ce11 in ExecInitModifyTable (node=0x599c6b282e08,
estate=0x599c6b260080, eflags=0) at nodeModifyTable.c:4889
mtstate = 0x599c6b260330
subplan = 0x599c6b282d78
operation = CMD_MERGE
nrels = 0
resultRelations = 0x0
withCheckOptionLists = 0x0
returningLists = 0x0
updateColnosLists = 0x0
resultRelInfo = 0x599c6b260f80
arowmarks = 0x0
l = 0x0
i = 0
rel = 0x7be2ba49e368
__func__ = "ExecInitModifyTable"
#2 0x0000599c2c9d63a3 in ExecInitNode (node=0x599c6b282e08,
estate=0x599c6b260080, eflags=0) at execProcnode.c:177
result = 0x0
subps = 0x599c6b282760
l = 0x599c6b2608c0
__func__ = "ExecInitNode"
#3 0x0000599c2c9cac66 in InitPlan (queryDesc=0x599c6b218ba0, eflags=0) at
execMain.c:985
operation = CMD_MERGE
plannedstmt = 0x599c6b284188
plan = 0x599c6b282e08
rangeTable = 0x599c6b283228
estate = 0x599c6b260080
planstate = 0x599c6b1ab310
tupType = 0x599c6b2011f0
l = 0x0
i = 1
__func__ = "InitPlan"
This (and bug #18828) found using SQLSmith / SQLReduce / creduce.
PG Bug reporting form <noreply@postgresql.org> 于2025年3月3日周一 18:31写道:
The following bug has been logged on the website:
Bug reference: 18830
Logged by: Robins Tharakan
Email address: tharakan@gmail.com
PostgreSQL version: Unsupported/Unknown
Operating system: Ubuntu
Description:With this SQL (backtraces below), postgres starts to segfault starting at
cbc127917e.SQL
===
CREATE AGGREGATE d(double precision ORDER BY anyelement) (
SFUNC = ordered_set_transition_multi,
STYPE = internal,
FINALFUNC = rank_final);
CREATE TABLE e (
a text,
b integer
) PARTITION BY LIST (a);
CREATE TABLE f (
a text,
b integer
);
CREATE TABLE g (
a text,
b integer
);
CREATE VIEW h AS
SELECT XMLSERIALIZE(DOCUMENT '<foo><bar>42</bar></foo>'AS text );
ALTER TABLE e ATTACH PARTITION f FOR VALUES IN ('a');
ALTER TABLE e ATTACH PARTITION g FOR VALUES IN ('b');MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));SQL Output
==========
$ psql -p 9999 postgres -f ../sqith/repro_final.sql
CREATE AGGREGATE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE VIEW
ALTER TABLE
ALTER TABLE
psql:../sqith/repro_final.sql:25: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:../sqith/repro_final.sql:25: error: connection to server was lostError Log
=========
2025-03-03 06:55:54.014 ACDT [137463] LOG: client backend (PID 137479) was
terminated by signal 11: Segmentation fault
2025-03-03 06:55:54.014 ACDT [137463] DETAIL: Failed process was running:
MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));
2025-03-03 06:55:54.014 ACDT [137463] LOG: terminating any other active
server processesCommit
======
Checking (15a79c73111~0) - 15a79c7311 - fail
Checking (15a79c73111~10) - 424ededc58 - fail
Checking (15a79c73111~30) - 945a9e3832 - fail
Checking (15a79c73111~70) - a4e986ef5a - fail
Checking (15a79c73111~150) - 6a2275b895 - fail
Checking (15a79c73111~310) - 117f9f328e - Pass
Checking (15a79c73111~230) - c89525d57b - Pass
Checking (15a79c73111~190) - c366d2bdba - fail
Checking (15a79c73111~210) - fb056564ec - fail
Checking (15a79c73111~220) - 44ec095751 - Pass
Checking (15a79c73111~215) - cbc127917e - fail
Checking (15a79c73111~217) - 428fadb7e9 - Pass
Checking (15a79c73111~216) - 926c7fce03 - Pass
Offending Commit is cbc127917e - HEAD~215BackTrace
=========
(gdb) bt
#0 ExecInitMerge (mtstate=0x599c6b260330, estate=0x599c6b260080) at
nodeModifyTable.c:3663
#1 0x0000599c2ca1ce11 in ExecInitModifyTable (node=0x599c6b282e08,
estate=0x599c6b260080, eflags=0) at nodeModifyTable.c:4889
#2 0x0000599c2c9d63a3 in ExecInitNode (node=0x599c6b282e08,
estate=0x599c6b260080, eflags=0) at execProcnode.c:177
#3 0x0000599c2c9cac66 in InitPlan (queryDesc=0x599c6b218ba0, eflags=0) at
execMain.c:985
#4 0x0000599c2c9c99f0 in standard_ExecutorStart (queryDesc=0x599c6b218ba0,
eflags=0) at execMain.c:259
#5 0x0000599c2c9c96fb in ExecutorStart (queryDesc=0x599c6b218ba0,
eflags=0)
at execMain.c:135
#6 0x0000599c2ccb3925 in ProcessQuery (plan=0x599c6b284188,
sourceText=0x599c6b16c4e0 "MERGE INTO e USING h ON a = xmlserialize
WHEN NOT MATCHED THEN INSERT VALUES (CAST(NULL AS text));", params=0x0,
queryEnv=0x0, dest=0x599c6b284308,
qc=0x7ffc8bc1aaf0) at pquery.c:155
#7 0x0000599c2ccb546e in PortalRunMulti (portal=0x599c6b1ef0a0,
isTopLevel=true, setHoldSnapshot=false, dest=0x599c6b284308,
altdest=0x599c6b284308, qc=0x7ffc8bc1aaf0)
at pquery.c:1271
#8 0x0000599c2ccb497a in PortalRun (portal=0x599c6b1ef0a0,
count=9223372036854775807, isTopLevel=true, dest=0x599c6b284308,
altdest=0x599c6b284308, qc=0x7ffc8bc1aaf0) at pquery.c:787
#9 0x0000599c2ccad440 in exec_simple_query (query_string=0x599c6b16c4e0
"MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));")
at postgres.c:1271
#10 0x0000599c2ccb27ef in PostgresMain (dbname=0x599c6b1a6680
"smithreduce",
username=0x599c6b1a6668 "smith") at postgres.c:4691BackTrace Full
==============
#0 ExecInitMerge (mtstate=0x599c6b260330, estate=0x599c6b260080) at
nodeModifyTable.c:3663
mergeActionList = 0x599c6b281b70
joinCondition = 0x0
relationDesc = 0x599c2c9df7cb <ExecTypeFromTL+33>
l = 0x100000000
lc__state = {l = 0x599c6b281bc0, i = 0}
node = 0x599c6b282e08
rootRelInfo = 0x599c6b260f98
resultRelInfo = 0x599c6b260f80
econtext = 0x599c6b2695f8
lc = 0x599c6b281bd8
i = 1
__func__ = "ExecInitMerge"
#1 0x0000599c2ca1ce11 in ExecInitModifyTable (node=0x599c6b282e08,
estate=0x599c6b260080, eflags=0) at nodeModifyTable.c:4889
mtstate = 0x599c6b260330
subplan = 0x599c6b282d78
operation = CMD_MERGE
nrels = 0
resultRelations = 0x0
withCheckOptionLists = 0x0
returningLists = 0x0
updateColnosLists = 0x0
resultRelInfo = 0x599c6b260f80
arowmarks = 0x0
l = 0x0
i = 0
rel = 0x7be2ba49e368
__func__ = "ExecInitModifyTable"
#2 0x0000599c2c9d63a3 in ExecInitNode (node=0x599c6b282e08,
estate=0x599c6b260080, eflags=0) at execProcnode.c:177
result = 0x0
subps = 0x599c6b282760
l = 0x599c6b2608c0
__func__ = "ExecInitNode"
#3 0x0000599c2c9cac66 in InitPlan (queryDesc=0x599c6b218ba0, eflags=0) at
execMain.c:985
operation = CMD_MERGE
plannedstmt = 0x599c6b284188
plan = 0x599c6b282e08
rangeTable = 0x599c6b283228
estate = 0x599c6b260080
planstate = 0x599c6b1ab310
tupType = 0x599c6b2011f0
l = 0x0
i = 1
__func__ = "InitPlan"This (and bug #18828) found using SQLSmith / SQLReduce / creduce.
Thanks for reporting.
I can reproduce this crash. On HEAD 3f1db99bfa, crash on a different
place due to the commit 75dfde1.
But cbc127917e is the root cause commit.
MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));
The plan of the above query looks as below:
QUERY PLAN
------------------------------------------------------------------
Merge on e (cost=0.00..58.29 rows=0 width=0)
-> Nested Loop Left Join (cost=0.00..58.29 rows=12 width=10)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Append (cost=0.00..58.16 rows=12 width=10)
Subplans Removed: 2
You can see that all partitions are pruned. After cbc127917e, we only
consider unpruned relations, then the list resultRelations is empty.
the estate->es_unpruned_relids contains (1,2), the node->resultRelations
contains (5,6).
nrels = list_length(resultRelations);
...
mtstate->resultRelInfo = (ResultRelInfo *)
palloc(nrels * sizeof(ResultRelInfo));
The memory of mtstate->resultRelInfo point to is undefined. When we access
its memory in ExecInitMerge(),
relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
crash happened.
After 75dfde1, mergeActionLists list is empty, so in ExecInitMerge() we
don't enter the foreach(lc, mergeActionLists),
and the mtstate->ps.ps_ExprContext has no chance to initialize. So
in ExecMergeNotMatched(), econtext is NULL.
econtext->ecxt_scantuple = NULL;
The above statement can trigger a segment fault.
For Merge command NOT MATCH, should we need the logic that only consider
unpruned relations in ExecInitModifyTable()?
Merge command seems more complex than update and delete. Can we consider
the unpruned relations in ExecInitModifyTable()
according to the command type. For merge, we do the logic before
cbc127917e, for now?
--
Thanks,
Tender Wang
Tender Wang <tndrwang@gmail.com> 于2025年3月3日周一 23:46写道:
Thanks for reporting.
I can reproduce this crash. On HEAD 3f1db99bfa, crash on a different
place due to the commit 75dfde1.
But cbc127917e is the root cause commit.MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));
The plan of the above query looks as below:
QUERY PLAN
------------------------------------------------------------------
Merge on e (cost=0.00..58.29 rows=0 width=0)
-> Nested Loop Left Join (cost=0.00..58.29 rows=12 width=10)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Append (cost=0.00..58.16 rows=12 width=10)
Subplans Removed: 2You can see that all partitions are pruned. After cbc127917e, we only
consider unpruned relations, then the list resultRelations is empty.
the estate->es_unpruned_relids contains (1,2), the node->resultRelations
contains (5,6).nrels = list_length(resultRelations);
...
mtstate->resultRelInfo = (ResultRelInfo *)
palloc(nrels * sizeof(ResultRelInfo));The memory of mtstate->resultRelInfo point to is undefined. When we access
its memory in ExecInitMerge(),relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
crash happened.
After 75dfde1, mergeActionLists list is empty, so in ExecInitMerge() we
don't enter the foreach(lc, mergeActionLists),
and the mtstate->ps.ps_ExprContext has no chance to initialize. So
in ExecMergeNotMatched(), econtext is NULL.
econtext->ecxt_scantuple = NULL;
The above statement can trigger a segment fault.For Merge command NOT MATCH, should we need the logic that only consider
unpruned relations in ExecInitModifyTable()?Merge command seems more complex than update and delete. Can we consider
the unpruned relations in ExecInitModifyTable()
according to the command type. For merge, we do the logic before
cbc127917e, for now?
I found the partition_prune.sql does not cover merge into ... not match
case, and I found an easy reproduce step, seeing below:
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
--
Thanks,
Tender Wang
On Tue, 4 Mar 2025 at 19:58, Tender Wang <tndrwang@gmail.com> wrote:
I found the partition_prune.sql does not cover merge into ... not match case, and I found an easy reproduce step, seeing below:
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
It looks like this is happening because ExecInitModifyTable() skips
adding mergeActionsList items when bms_is_member(rti,
estate->es_unpruned_relids) is false for all resultRelations. This
results in an empty actions list. Because the MERGE is performing a
LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
NULL;" because of a NULL econtext. econtext is NULL because
ExecInitMerge() skips calling ExecAssignExprContext() when
mergeActionLists is empty.
There are a couple of ways I can see to fix this, 1) would be to move
the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
in ExecInitMerge(), or 2) add code to return NULL in
ExecMergeNotMatched() if actionStates is NULL.
I think maybe #1 is the better option as #2 adds additional code that
executes on every ExecMergeNotMatched() call. The patch does #1. We
should probably add your test case too.
David
Attachments:
ExecInitMerge_fix.patchapplication/octet-stream; name=ExecInitMerge_fix.patchDownload+3-2
On Tue, Mar 4, 2025 at 3:00 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 4 Mar 2025 at 19:58, Tender Wang <tndrwang@gmail.com> wrote:
I found the partition_prune.sql does not cover merge into ... not match case, and I found an easy reproduce step, seeing below:
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.It looks like this is happening because ExecInitModifyTable() skips
adding mergeActionsList items when bms_is_member(rti,
estate->es_unpruned_relids) is false for all resultRelations. This
results in an empty actions list. Because the MERGE is performing a
LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
NULL;" because of a NULL econtext. econtext is NULL because
ExecInitMerge() skips calling ExecAssignExprContext() when
mergeActionLists is empty.There are a couple of ways I can see to fix this, 1) would be to move
the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
in ExecInitMerge(), or 2) add code to return NULL in
ExecMergeNotMatched() if actionStates is NULL.I think maybe #1 is the better option as #2 adds additional code that
executes on every ExecMergeNotMatched() call. The patch does #1. We
should probably add your test case too.
Thanks for taking a look and the patch. I was thinking #1 too.
Though I was thinking of doing the ps_ExprContext initialization in
ExecInitModifyTable() before calling ExecInitMerge(), to be similar to
what we do for other ModifyTable properties that need one.
--
Thanks, Amit Langote
David Rowley <dgrowleyml@gmail.com> 于2025年3月4日周二 17:30写道:
On Tue, 4 Mar 2025 at 19:58, Tender Wang <tndrwang@gmail.com> wrote:
I found the partition_prune.sql does not cover merge into ... not match
case, and I found an easy reproduce step, seeing below:
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on(true) on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.It looks like this is happening because ExecInitModifyTable() skips
adding mergeActionsList items when bms_is_member(rti,
estate->es_unpruned_relids) is false for all resultRelations. This
results in an empty actions list. Because the MERGE is performing a
LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
NULL;" because of a NULL econtext. econtext is NULL because
ExecInitMerge() skips calling ExecAssignExprContext() when
mergeActionLists is empty.There are a couple of ways I can see to fix this, 1) would be to move
the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
in ExecInitMerge(), or 2) add code to return NULL in
ExecMergeNotMatched() if actionStates is NULL.I think maybe #1 is the better option as #2 adds additional code that
executes on every ExecMergeNotMatched() call. The patch does #1. We
should probably add your test case too.
Hmm, apply your patch, I get different results when set
enable_partition_pruning = off, seeing below:
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
MERGE 0
postgres=# set enable_partition_pruning = off;
SET
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
MERGE 1
if all partitions are pruned, the resultRelation's mergeActions has no
chance to be set.
--
Thanks,
Tender Wang
On Tue, Mar 4, 2025 at 3:16 PM Tender Wang <tndrwang@gmail.com> wrote:
David Rowley <dgrowleyml@gmail.com> 于2025年3月4日周二 17:30写道:
On Tue, 4 Mar 2025 at 19:58, Tender Wang <tndrwang@gmail.com> wrote:
I found the partition_prune.sql does not cover merge into ... not match case, and I found an easy reproduce step, seeing below:
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.It looks like this is happening because ExecInitModifyTable() skips
adding mergeActionsList items when bms_is_member(rti,
estate->es_unpruned_relids) is false for all resultRelations. This
results in an empty actions list. Because the MERGE is performing a
LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
NULL;" because of a NULL econtext. econtext is NULL because
ExecInitMerge() skips calling ExecAssignExprContext() when
mergeActionLists is empty.There are a couple of ways I can see to fix this, 1) would be to move
the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
in ExecInitMerge(), or 2) add code to return NULL in
ExecMergeNotMatched() if actionStates is NULL.I think maybe #1 is the better option as #2 adds additional code that
executes on every ExecMergeNotMatched() call. The patch does #1. We
should probably add your test case too.Hmm, apply your patch, I get different results when set enable_partition_pruning = off, seeing below:
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
MERGE 0
postgres=# set enable_partition_pruning = off;
SET
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
MERGE 1
Hmm, interesting. Can you share the full test case?
What's the behavior on v17 and older? Just want to be sure if we're
looking at another bug in the code committed in v18.
--
Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 18:10写道:
On Tue, Mar 4, 2025 at 3:16 PM Tender Wang <tndrwang@gmail.com> wrote:
David Rowley <dgrowleyml@gmail.com> 于2025年3月4日周二 17:30写道:
On Tue, 4 Mar 2025 at 19:58, Tender Wang <tndrwang@gmail.com> wrote:
I found the partition_prune.sql does not cover merge into ... not
match case, and I found an easy reproduce step, seeing below:
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on(true) on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.It looks like this is happening because ExecInitModifyTable() skips
adding mergeActionsList items when bms_is_member(rti,
estate->es_unpruned_relids) is false for all resultRelations. This
results in an empty actions list. Because the MERGE is performing a
LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
NULL;" because of a NULL econtext. econtext is NULL because
ExecInitMerge() skips calling ExecAssignExprContext() when
mergeActionLists is empty.There are a couple of ways I can see to fix this, 1) would be to move
the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
in ExecInitMerge(), or 2) add code to return NULL in
ExecMergeNotMatched() if actionStates is NULL.I think maybe #1 is the better option as #2 adds additional code that
executes on every ExecMergeNotMatched() call. The patch does #1. We
should probably add your test case too.Hmm, apply your patch, I get different results when set
enable_partition_pruning = off, seeing below:
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
MERGE 0
postgres=# set enable_partition_pruning = off;
SET
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
MERGE 1Hmm, interesting. Can you share the full test case?
What's the behavior on v17 and older? Just want to be sure if we're
looking at another bug in the code committed in v18.The test case comes from partition_prune.sql, but now we don't cover merge
.. not match.
create table part_abc (a int, b text, c bool) partition by list (a);
create table part_abc_1 (b text, a int, c bool);
create table part_abc_2 (a int, c bool, b text);
alter table part_abc attach partition part_abc_1 for values in (1);
alter table part_abc attach partition part_abc_2 for values in (2);
insert into part_abc values (1, 'b', true);
insert into part_abc values (2, 'c', true);
create view part_abc_view as select * from part_abc where b <> 'a' with
check option;
create function stable_one() returns int as $$ begin return 1; end; $$
language plpgsql stable;
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
--
Thanks,
Tender Wang
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 18:10写道:
Hmm, interesting. Can you share the full test case?
What's the behavior on v17 and older? Just want to be sure if we're
looking at another bug in the code committed in v18.
I do test on 17.1, the results are consistent. See below:
psql (17.1)
Type "help" for help.
postgres=# create table part_abc (a int, b text, c bool) partition by list
(a);
create table part_abc_1 (b text, a int, c bool);
create table part_abc_2 (a int, c bool, b text);
alter table part_abc attach partition part_abc_1 for values in (1);
alter table part_abc attach partition part_abc_2 for values in (2);
insert into part_abc values (1, 'b', true);
insert into part_abc values (2, 'c', true);
create view part_abc_view as select * from part_abc where b <> 'a' with
check option;
create function stable_one() returns int as $$ begin return 1; end; $$
language plpgsql stable;
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
CREATE TABLE
CREATE TABLE
CREATE TABLE
ALTER TABLE
ALTER TABLE
INSERT 0 1
INSERT 0 1
CREATE VIEW
CREATE FUNCTION
MERGE 1
postgres=# show enable_partition_pruning ;
enable_partition_pruning
--------------------------
on
(1 row)
postgres=# create database test;
CREATE DATABASE
postgres=# \c test
You are now connected to database "test" as user "ecs-user".
test=# set enable_partition_pruning = off;
SET
test=# show enable_partition_pruning ;
enable_partition_pruning
--------------------------
off
(1 row)
test=# create table part_abc (a int, b text, c bool) partition by list (a);
create table part_abc_1 (b text, a int, c bool);
create table part_abc_2 (a int, c bool, b text);
alter table part_abc attach partition part_abc_1 for values in (1);
alter table part_abc attach partition part_abc_2 for values in (2);
insert into part_abc values (1, 'b', true);
insert into part_abc values (2, 'c', true);
create view part_abc_view as select * from part_abc where b <> 'a' with
check option;
create function stable_one() returns int as $$ begin return 1; end; $$
language plpgsql stable;
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
CREATE TABLE
CREATE TABLE
CREATE TABLE
ALTER TABLE
ALTER TABLE
INSERT 0 1
INSERT 0 1
CREATE VIEW
CREATE FUNCTION
MERGE 1
--
Thanks,
Tender Wang
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 18:10写道:
On Tue, Mar 4, 2025 at 3:16 PM Tender Wang <tndrwang@gmail.com> wrote:
David Rowley <dgrowleyml@gmail.com> 于2025年3月4日周二 17:30写道:
On Tue, 4 Mar 2025 at 19:58, Tender Wang <tndrwang@gmail.com> wrote:
I found the partition_prune.sql does not cover merge into ... not
match case, and I found an easy reproduce step, seeing below:
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on(true) on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.It looks like this is happening because ExecInitModifyTable() skips
adding mergeActionsList items when bms_is_member(rti,
estate->es_unpruned_relids) is false for all resultRelations. This
results in an empty actions list. Because the MERGE is performing a
LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
NULL;" because of a NULL econtext. econtext is NULL because
ExecInitMerge() skips calling ExecAssignExprContext() when
mergeActionLists is empty.There are a couple of ways I can see to fix this, 1) would be to move
the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
in ExecInitMerge(), or 2) add code to return NULL in
ExecMergeNotMatched() if actionStates is NULL.I think maybe #1 is the better option as #2 adds additional code that
executes on every ExecMergeNotMatched() call. The patch does #1. We
should probably add your test case too.Hmm, apply your patch, I get different results when set
enable_partition_pruning = off, seeing below:
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
MERGE 0
postgres=# set enable_partition_pruning = off;
SET
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
MERGE 1Hmm, interesting. Can you share the full test case?
What's the behavior on v17 and older? Just want to be sure if we're
looking at another bug in the code committed in v18.
Because all partitions are pruned, so bms_is_member(rti,
estate->es_unpruned_relids) is false, then mergeActionLists is empty.
Even though we initialize econtext to not null, but
in ExecMergeNotMatched(),
because actionStates list is empty, so no merge operation happens.
actionStates is empty due to empty mergeActionLists.
--
Thanks,
Tender Wang
On Tue, Mar 4, 2025 at 4:36 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 18:10写道:
On Tue, Mar 4, 2025 at 3:16 PM Tender Wang <tndrwang@gmail.com> wrote:
David Rowley <dgrowleyml@gmail.com> 于2025年3月4日周二 17:30写道:
There are a couple of ways I can see to fix this, 1) would be to move
the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
in ExecInitMerge(), or 2) add code to return NULL in
ExecMergeNotMatched() if actionStates is NULL.I think maybe #1 is the better option as #2 adds additional code that
executes on every ExecMergeNotMatched() call. The patch does #1. We
should probably add your test case too.Hmm, apply your patch, I get different results when set enable_partition_pruning = off, seeing below:
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
MERGE 0
postgres=# set enable_partition_pruning = off;
SET
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
MERGE 1Hmm, interesting. Can you share the full test case?
What's the behavior on v17 and older? Just want to be sure if we're
looking at another bug in the code committed in v18.Because all partitions are pruned, so bms_is_member(rti,
estate->es_unpruned_relids) is false, then mergeActionLists is empty.
Even though we initialize econtext to not null, but in ExecMergeNotMatched(),
because actionStates list is empty, so no merge operation happens.
actionStates is empty due to empty mergeActionLists.
Yes, I get that. What I am wondering about is whether the case you
presented where disabling partition pruning gives a different result
for the same query has anything to do with this particular bug that's
causing the crash Robins reported. If there's a separate bug, we have
to check if it is present in v17 and older.
--
Thanks, Amit Langote
On Mon, Mar 3, 2025 at 9:16 PM Tender Wang <tndrwang@gmail.com> wrote:
nrels = list_length(resultRelations);
...
mtstate->resultRelInfo = (ResultRelInfo *)
palloc(nrels * sizeof(ResultRelInfo));The memory of mtstate->resultRelInfo point to is undefined. When we access its memory in ExecInitMerge(),
This needs to be fixed saparately.
relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
crash happened.
Do you have a case where this access to undefined
ModifyTableState.resultRelInfo occurs? I would have thought that it
should not happen.
--
Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 19:51写道:
On Mon, Mar 3, 2025 at 9:16 PM Tender Wang <tndrwang@gmail.com> wrote:
nrels = list_length(resultRelations);
...
mtstate->resultRelInfo = (ResultRelInfo *)
palloc(nrels * sizeof(ResultRelInfo));The memory of mtstate->resultRelInfo point to is undefined. When we
access its memory in ExecInitMerge(),
This needs to be fixed saparately.
relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
crash happened.
Do you have a case where this access to undefined
ModifyTableState.resultRelInfo occurs? I would have thought that it
should not happen.
"undefined" may not be accurate, "invalid" seems more correct.
I still use the case:
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
(gdb) p *resultRelInfo
$5 = {type = 2139062142, ri_RangeTableIndex = 2139062143, ri_RelationDesc =
0x180, ri_NumIndices = 195, ri_IndexRelationDescs = 0x500000182,
ri_IndexRelationInfo = 0x7f8773b2aa38, ri_RowIdAttNo = 0,
ri_extraUpdatedCols = 0x0, ri_projectNew = 0x0,
ri_newTupleSlot = 0x0, ri_oldTupleSlot = 0x0, ri_projectNewInfoValid =
false, ri_needLockTagTuple = false, ri_TrigDesc = 0x0, ri_TrigFunctions =
0x0, ri_TrigWhenExprs = 0x0, ri_TrigInstrument = 0x0, ri_ReturningSlot =
0x0, ri_TrigOldSlot = 0x0,
ri_TrigNewSlot = 0x0, ri_AllNullSlot = 0x0, ri_FdwRoutine = 0x0,
ri_FdwState = 0x0, ri_usesFdwDirectModify = false, ri_NumSlots = 0,
ri_NumSlotsInitialized = 0, ri_BatchSize = 0, ri_Slots = 0x0, ri_PlanSlots
= 0x0, ri_WithCheckOptions = 0x0,
ri_WithCheckOptionExprs = 0x0, ri_ConstraintExprs = 0x0,
ri_GeneratedExprsI = 0x0, ri_GeneratedExprsU = 0x0, ri_NumGeneratedNeededI
= 0, ri_NumGeneratedNeededU = 0, ri_returningList = 0x0,
ri_projectReturning = 0x0,
ri_onConflictArbiterIndexes = 0x0, ri_onConflict = 0x0, ri_MergeActions =
{0x0, 0x0, 0x0}, ri_MergeJoinCondition = 0x0, ri_PartitionCheckExpr = 0x0,
ri_ChildToRootMap = 0x0, ri_ChildToRootMapValid = false, ri_RootToChildMap
= 0x0,
ri_RootToChildMapValid = false, ri_RootResultRelInfo = 0x0,
ri_PartitionTupleSlot = 0x0, ri_CopyMultiInsertBuffer = 0x0,
ri_ancestorResultRels = 0x0}
ri_RelationDesc = 0x180, this address is not invalid.
--
Thanks,
Tender Wang
On Tue, Mar 4, 2025 at 5:36 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 19:51写道:
On Mon, Mar 3, 2025 at 9:16 PM Tender Wang <tndrwang@gmail.com> wrote:
nrels = list_length(resultRelations);
...
mtstate->resultRelInfo = (ResultRelInfo *)
palloc(nrels * sizeof(ResultRelInfo));The memory of mtstate->resultRelInfo point to is undefined. When we access its memory in ExecInitMerge(),
This needs to be fixed saparately.
relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
crash happened.
Do you have a case where this access to undefined
ModifyTableState.resultRelInfo occurs? I would have thought that it
should not happen."undefined" may not be accurate, "invalid" seems more correct.
I still use the case:
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);(gdb) p *resultRelInfo
$5 = {type = 2139062142, ri_RangeTableIndex = 2139062143, ri_RelationDesc = 0x180, ri_NumIndices = 195, ri_IndexRelationDescs = 0x500000182, ri_IndexRelationInfo = 0x7f8773b2aa38, ri_RowIdAttNo = 0, ri_extraUpdatedCols = 0x0, ri_projectNew = 0x0,
ri_newTupleSlot = 0x0, ri_oldTupleSlot = 0x0, ri_projectNewInfoValid = false, ri_needLockTagTuple = false, ri_TrigDesc = 0x0, ri_TrigFunctions = 0x0, ri_TrigWhenExprs = 0x0, ri_TrigInstrument = 0x0, ri_ReturningSlot = 0x0, ri_TrigOldSlot = 0x0,
ri_TrigNewSlot = 0x0, ri_AllNullSlot = 0x0, ri_FdwRoutine = 0x0, ri_FdwState = 0x0, ri_usesFdwDirectModify = false, ri_NumSlots = 0, ri_NumSlotsInitialized = 0, ri_BatchSize = 0, ri_Slots = 0x0, ri_PlanSlots = 0x0, ri_WithCheckOptions = 0x0,
ri_WithCheckOptionExprs = 0x0, ri_ConstraintExprs = 0x0, ri_GeneratedExprsI = 0x0, ri_GeneratedExprsU = 0x0, ri_NumGeneratedNeededI = 0, ri_NumGeneratedNeededU = 0, ri_returningList = 0x0, ri_projectReturning = 0x0,
ri_onConflictArbiterIndexes = 0x0, ri_onConflict = 0x0, ri_MergeActions = {0x0, 0x0, 0x0}, ri_MergeJoinCondition = 0x0, ri_PartitionCheckExpr = 0x0, ri_ChildToRootMap = 0x0, ri_ChildToRootMapValid = false, ri_RootToChildMap = 0x0,
ri_RootToChildMapValid = false, ri_RootResultRelInfo = 0x0, ri_PartitionTupleSlot = 0x0, ri_CopyMultiInsertBuffer = 0x0, ri_ancestorResultRels = 0x0}ri_RelationDesc = 0x180, this address is not invalid.
Does the query crash or are you just able to see invalid memory
address using gdb?
--
Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 20:30写道:
On Tue, Mar 4, 2025 at 5:36 PM Tender Wang <tndrwang@gmail.com> wrote:
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 19:51写道:
On Mon, Mar 3, 2025 at 9:16 PM Tender Wang <tndrwang@gmail.com> wrote:
nrels = list_length(resultRelations);
...
mtstate->resultRelInfo = (ResultRelInfo *)
palloc(nrels * sizeof(ResultRelInfo));The memory of mtstate->resultRelInfo point to is undefined. When we
access its memory in ExecInitMerge(),
This needs to be fixed saparately.
relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
crash happened.
Do you have a case where this access to undefined
ModifyTableState.resultRelInfo occurs? I would have thought that it
should not happen."undefined" may not be accurate, "invalid" seems more correct.
I still use the case:
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);(gdb) p *resultRelInfo
$5 = {type = 2139062142, ri_RangeTableIndex = 2139062143,ri_RelationDesc = 0x180, ri_NumIndices = 195, ri_IndexRelationDescs =
0x500000182, ri_IndexRelationInfo = 0x7f8773b2aa38, ri_RowIdAttNo = 0,
ri_extraUpdatedCols = 0x0, ri_projectNew = 0x0,ri_newTupleSlot = 0x0, ri_oldTupleSlot = 0x0, ri_projectNewInfoValid =
false, ri_needLockTagTuple = false, ri_TrigDesc = 0x0, ri_TrigFunctions =
0x0, ri_TrigWhenExprs = 0x0, ri_TrigInstrument = 0x0, ri_ReturningSlot =
0x0, ri_TrigOldSlot = 0x0,ri_TrigNewSlot = 0x0, ri_AllNullSlot = 0x0, ri_FdwRoutine = 0x0,
ri_FdwState = 0x0, ri_usesFdwDirectModify = false, ri_NumSlots = 0,
ri_NumSlotsInitialized = 0, ri_BatchSize = 0, ri_Slots = 0x0, ri_PlanSlots
= 0x0, ri_WithCheckOptions = 0x0,ri_WithCheckOptionExprs = 0x0, ri_ConstraintExprs = 0x0,
ri_GeneratedExprsI = 0x0, ri_GeneratedExprsU = 0x0, ri_NumGeneratedNeededI
= 0, ri_NumGeneratedNeededU = 0, ri_returningList = 0x0,
ri_projectReturning = 0x0,ri_onConflictArbiterIndexes = 0x0, ri_onConflict = 0x0,
ri_MergeActions = {0x0, 0x0, 0x0}, ri_MergeJoinCondition = 0x0,
ri_PartitionCheckExpr = 0x0, ri_ChildToRootMap = 0x0,
ri_ChildToRootMapValid = false, ri_RootToChildMap = 0x0,ri_RootToChildMapValid = false, ri_RootResultRelInfo = 0x0,
ri_PartitionTupleSlot = 0x0, ri_CopyMultiInsertBuffer = 0x0,
ri_ancestorResultRels = 0x0}ri_RelationDesc = 0x180, this address is not invalid.
Does the query crash or are you just able to see invalid memory
address using gdb?
git reset --hard cbc127917e
The query will crash on relationDesc =
RelationGetDescr(resultRelInfo->ri_RelationDesc);
#0 ExecInitMerge (mtstate=0x55b08fc19a00, estate=0x55b08fc18c20) at
nodeModifyTable.c:3663
3663 relationDesc =
RelationGetDescr(resultRelInfo->ri_RelationDesc);
(gdb) bt
#0 ExecInitMerge (mtstate=0x55b08fc19a00, estate=0x55b08fc18c20) at
nodeModifyTable.c:3663
#1 0x000055b07f1046af in ExecInitModifyTable (node=0x55b08fc3d338,
estate=0x55b08fc18c20, eflags=0) at nodeModifyTable.c:4889
#2 0x000055b07f0bdd77 in ExecInitNode (node=0x55b08fc3d338,
estate=0x55b08fc18c20, eflags=0) at execProcnode.c:177
#3 0x000055b07f0b26a9 in InitPlan (queryDesc=0x55b08fbe4990, eflags=0) at
execMain.c:985
#4 0x000055b07f0b1435 in standard_ExecutorStart (queryDesc=0x55b08fbe4990,
eflags=0) at execMain.c:259
#5 0x000055b07f0b1143 in ExecutorStart (queryDesc=0x55b08fbe4990,
eflags=0) at execMain.c:135
#6 0x000055b07f395fab in ProcessQuery (plan=0x55b08fc3f0f8,
sourceText=0x55b08fb19420 "merge into part_abc_view pt\nusing (select
stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)\non pt.a =
stable_one() +2\nwhen not matched then insert values(1, 'd', false);",
params=0x0, queryEnv=0x0,
dest=0x55b08fc3f278, qc=0x7ffc72bad230) at pquery.c:155
#7 0x000055b07f397af1 in PortalRunMulti (portal=0x55b08fb99c40,
isTopLevel=true, setHoldSnapshot=false, dest=0x55b08fc3f278,
altdest=0x55b08fc3f278, qc=0x7ffc72bad230) at pquery.c:1271
#8 0x000055b07f396ffd in PortalRun (portal=0x55b08fb99c40,
count=9223372036854775807, isTopLevel=true, dest=0x55b08fc3f278,
altdest=0x55b08fc3f278, qc=0x7ffc72bad230) at pquery.c:787
#9 0x000055b07f38fafd in exec_simple_query (
query_string=0x55b08fb19420 "merge into part_abc_view pt\nusing (select
stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)\non pt.a =
stable_one() +2\nwhen not matched then insert values(1, 'd', false);") at
postgres.c:1271
#10 0x000055b07f394e65 in PostgresMain (dbname=0x55b08fb533e8 "postgres",
username=0x55b08fb533c8 "ecs-user") at postgres.c:4691
#11 0x000055b07f38b71a in BackendMain (startup_data=0x7ffc72bad4b0 "",
startup_data_len=4) at backend_startup.c:107
#12 0x000055b07f298873 in postmaster_child_launch (child_type=B_BACKEND,
child_slot=1, startup_data=0x7ffc72bad4b0 "", startup_data_len=4,
client_sock=0x7ffc72bad510) at launch_backend.c:274
#13 0x000055b07f29f1ce in BackendStartup (client_sock=0x7ffc72bad510) at
postmaster.c:3519
#14 0x000055b07f29c7ec in ServerLoop () at postmaster.c:1688
#15 0x000055b07f29c0e4 in PostmasterMain (argc=3, argv=0x55b08fb13a30) at
postmaster.c:1386
#16 0x000055b07f13fbd1 in main (argc=3, argv=0x55b08fb13a30) at main.c:230
But after you commit the 75dfde1, crash happened on ExecMergeNotMatched(),
econtext->ecxt_scantuple = NULL;
because econtext is null.
After 75dfde1, mergeActionLists is NIL, it directly returns in if
(mergeActionLists == NIL).
I think the crash and the inconsistent result are all caused by cbc127917e.
--
Thanks,
Tender Wang
On Tue, 4 Mar 2025 at 09:30, David Rowley <dgrowleyml@gmail.com> wrote:
It looks like this is happening because ExecInitModifyTable() skips
adding mergeActionsList items when bms_is_member(rti,
estate->es_unpruned_relids) is false for all resultRelations. This
results in an empty actions list. Because the MERGE is performing a
LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
NULL;" because of a NULL econtext. econtext is NULL because
ExecInitMerge() skips calling ExecAssignExprContext() when
mergeActionLists is empty.There are a couple of ways I can see to fix this, 1) would be to move
the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
in ExecInitMerge(), or 2) add code to return NULL in
ExecMergeNotMatched() if actionStates is NULL.
Hmm, I don't think that's right. I think this is just masking the problem.
I think the real problem is that, as things stand,
ExecInitModifyTable() must never be allowed to prune every leaf
partition, because there are multiple places that rely on there being
at least one resultRelInfo.
For example, ExecModifyTable() passes the first resultRelInfo to
ExecMerge() and ExecMergeNotMatched() in the NOT MATCHED case, and
those use the action lists in that resultRelInfo to work out what
action to execute. The fact that it didn't crash with this patch is
probably just luck, because it's passing a pointer to uninitialised
memory, but it's still doing the wrong thing by not invoking any merge
actions.
Similarly, if MERGE does an INSERT on a partitioned table, the code in
ExecInitPartitionInfo() from ExecFindPartition() assumes that
mtstate->resultRelInfo has at least one entry.
So I think the simplest solution is to arrange for
ExecInitModifyTable() to always include details of at least one result
relation, adding at least one entry to each of the lists. As an
alternative, it might be possible to make the executor cope with an
empty resultRelInfo array (by using the root resultRelInfo instead,
where one is needed), but I think that would be a bigger change.
Regards,
Dean
Dean Rasheed <dean.a.rasheed@gmail.com> 于2025年3月5日周三 02:56写道:
On Tue, 4 Mar 2025 at 09:30, David Rowley <dgrowleyml@gmail.com> wrote:
It looks like this is happening because ExecInitModifyTable() skips
adding mergeActionsList items when bms_is_member(rti,
estate->es_unpruned_relids) is false for all resultRelations. This
results in an empty actions list. Because the MERGE is performing a
LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
NULL;" because of a NULL econtext. econtext is NULL because
ExecInitMerge() skips calling ExecAssignExprContext() when
mergeActionLists is empty.There are a couple of ways I can see to fix this, 1) would be to move
the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
in ExecInitMerge(), or 2) add code to return NULL in
ExecMergeNotMatched() if actionStates is NULL.Hmm, I don't think that's right. I think this is just masking the problem.
I think the real problem is that, as things stand,
ExecInitModifyTable() must never be allowed to prune every leaf
partition, because there are multiple places that rely on there being
at least one resultRelInfo.For example, ExecModifyTable() passes the first resultRelInfo to
ExecMerge() and ExecMergeNotMatched() in the NOT MATCHED case, and
those use the action lists in that resultRelInfo to work out what
action to execute. The fact that it didn't crash with this patch is
probably just luck, because it's passing a pointer to uninitialised
memory, but it's still doing the wrong thing by not invoking any merge
actions.
Yeah, I agree with you. As I said before, this fix will get a wrong result
compared to
enable_partition_pruning = off, even though no crash happened again.
At least for NOT MATCH, we must have resultRelInfo entry.
Similarly, if MERGE does an INSERT on a partitioned table, the code in
ExecInitPartitionInfo() from ExecFindPartition() assumes that
mtstate->resultRelInfo has at least one entry.So I think the simplest solution is to arrange for
ExecInitModifyTable() to always include details of at least one result
relation, adding at least one entry to each of the lists.
I have not tried this way. I'm not sure about this.
As an
alternative, it might be possible to make the executor cope with an
empty resultRelInfo array (by using the root resultRelInfo instead,
where one is needed), but I think that would be a bigger change.
Yeah, this way changes a lot of codes.
--
Thanks,
Tender Wang
On Wed, Mar 5, 2025 at 3:56 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Tue, 4 Mar 2025 at 09:30, David Rowley <dgrowleyml@gmail.com> wrote:
It looks like this is happening because ExecInitModifyTable() skips
adding mergeActionsList items when bms_is_member(rti,
estate->es_unpruned_relids) is false for all resultRelations. This
results in an empty actions list. Because the MERGE is performing a
LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
NULL;" because of a NULL econtext. econtext is NULL because
ExecInitMerge() skips calling ExecAssignExprContext() when
mergeActionLists is empty.There are a couple of ways I can see to fix this, 1) would be to move
the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
in ExecInitMerge(), or 2) add code to return NULL in
ExecMergeNotMatched() if actionStates is NULL.Hmm, I don't think that's right. I think this is just masking the problem.
I think the real problem is that, as things stand,
ExecInitModifyTable() must never be allowed to prune every leaf
partition, because there are multiple places that rely on there being
at least one resultRelInfo.For example, ExecModifyTable() passes the first resultRelInfo to
ExecMerge() and ExecMergeNotMatched() in the NOT MATCHED case, and
those use the action lists in that resultRelInfo to work out what
action to execute. The fact that it didn't crash with this patch is
probably just luck, because it's passing a pointer to uninitialised
memory, but it's still doing the wrong thing by not invoking any merge
actions.Similarly, if MERGE does an INSERT on a partitioned table, the code in
ExecInitPartitionInfo() from ExecFindPartition() assumes that
mtstate->resultRelInfo has at least one entry.So I think the simplest solution is to arrange for
ExecInitModifyTable() to always include details of at least one result
relation, adding at least one entry to each of the lists.
Thanks -- I’ve studied the code and I agree with the conclusion: when
all result relations are pruned, we still need to lock and process the
first one to preserve executor invariants.
Your examples with ExecMerge() and ExecInitPartitionInfo() make the
consequences of missing resultRelInfo[0] pretty clear. Locking and
including the first result relation, even if pruned, seems like the
most straightforward way to maintain correctness without deeper
structural changes.
I've come up with the attached. I'll still need to add a test case.
--
Thanks, Amit Langote
Attachments:
v1-0001-Ensure-first-result-relation-is-included-even-if-.patchapplication/octet-stream; name=v1-0001-Ensure-first-result-relation-is-included-even-if-.patchDownload+34-2
Amit Langote <amitlangote09@gmail.com> 于2025年3月12日周三 20:24写道:
Thanks -- I’ve studied the code and I agree with the conclusion: when
all result relations are pruned, we still need to lock and process the
first one to preserve executor invariants.Your examples with ExecMerge() and ExecInitPartitionInfo() make the
consequences of missing resultRelInfo[0] pretty clear. Locking and
including the first result relation, even if pruned, seems like the
most straightforward way to maintain correctness without deeper
structural changes.I've come up with the attached. I'll still need to add a test case.
It looks good to me, on a quick read-through.
We now don't have merge into ... not match ... test case in
partition_prune.sql
In [1]/messages/by-id/CAHewXN=S5nqVMqoYpFXe8OykqDC-r22vG7RvDN-VFamY7XcVTw@mail.gmail.com -- Thanks, Tender Wang, I figured out a query that could trigger a crash. It may be used as
the test case.
create table part_abc (a int, b text, c bool) partition by list (a);
create table part_abc_1 (b text, a int, c bool);
create table part_abc_2 (a int, c bool, b text);
alter table part_abc attach partition part_abc_1 for values in (1);
alter table part_abc attach partition part_abc_2 for values in (2);
insert into part_abc values (1, 'b', true);
insert into part_abc values (2, 'c', true);
create view part_abc_view as select * from part_abc where b <> 'a' with
check option;
create function stable_one() returns int as $$ begin return 1; end; $$
language plpgsql stable;
Above SQLs are already in parition_prune.sql, only need the below SQLs.
explain (costs off)
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
[1]: /messages/by-id/CAHewXN=S5nqVMqoYpFXe8OykqDC-r22vG7RvDN-VFamY7XcVTw@mail.gmail.com -- Thanks, Tender Wang
/messages/by-id/CAHewXN=S5nqVMqoYpFXe8OykqDC-r22vG7RvDN-VFamY7XcVTw@mail.gmail.com
--
Thanks,
Tender Wang
On Wed, 12 Mar 2025 at 12:24, Amit Langote <amitlangote09@gmail.com> wrote:
Thanks -- I’ve studied the code and I agree with the conclusion: when
all result relations are pruned, we still need to lock and process the
first one to preserve executor invariants.Your examples with ExecMerge() and ExecInitPartitionInfo() make the
consequences of missing resultRelInfo[0] pretty clear. Locking and
including the first result relation, even if pruned, seems like the
most straightforward way to maintain correctness without deeper
structural changes.I've come up with the attached. I'll still need to add a test case.
Hmm, this isn't quite sufficient.
In ExecDoInitialPruning(), stmt->resultRelations contains all the
result relations plus the root relation (if set) of each ModifyTable
node in the planned stmt (see set_plan_refs()). Therefore, just adding
the first result relation in that list may not be sufficient to ensure
that every ModifyTable node ends up with at least one unpruned result
relation.
For example, consider:
create table foo (a int);
create table part_abc (a int, b text, c bool) partition by list (a);
create table part_abc_1 (b text, c bool, a int);
create table part_abc_2 (b text, a int, c bool);
alter table part_abc attach partition part_abc_1 for values in (1);
alter table part_abc attach partition part_abc_2 for values in (2);
insert into part_abc_1 values ('b', true, 1);
insert into part_abc_2 values ('c', 2, true);
create function stable_one() returns int as
$$ begin return 1; end; $$ language plpgsql stable;
with t as (
merge into part_abc pt using (values(1)) v(a) on pt.a = stable_one() + 2
when not matched then insert values (v.a, 'd', false)
returning pt.*
)
insert into foo select a from t;
ERROR: trying to open a pruned relation
The problem here is that "foo" is the first result relation in
stmt->resultRelations, so that's the one that it chooses not to prune,
which is of no use to the merge.
Regards,
Dean