BUG #19056: ExecInitPartitionExecPruning segfault due to NULL es_part_prune_infos
The following bug has been logged on the website:
Bug reference: 19056
Logged by: Fei Changhong
Email address: feichanghong@qq.com
PostgreSQL version: 18rc1
Operating system: Operating system: centos 8,Kernel version: 5.10.13
Description:
Hi all,
I recently encountered a crash when executing a DELETE on PG18. The issue
can be reproduced on the HEAD branch with the following steps:
session1:
```sql
CREATE TABLE test_hash (
id int,
data text
) PARTITION BY HASH (id);
CREATE TABLE test_hash_0 PARTITION OF test_hash
FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE test_hash_1 PARTITION OF test_hash
FOR VALUES WITH (MODULUS 2, REMAINDER 1);
insert into test_hash select 1, '1';
begin ;
update test_hash set data = '2';
```
session2:
```sql
set plan_cache_mode to force_generic_plan;
prepare s as delete from test_hash where id = $1;
execute s(1);
```
session1:
```sql
commit;
```
The following stack trace was observed:
```
(gdb) bt 10
#0 0x000000000079de74 in list_nth (list=0x0, n=0) at
../../../src/include/nodes/pg_list.h:301
#1 0x00000000007a0eeb in ExecInitPartitionExecPruning (planstate=0x1884f40,
n_total_subplans=2, part_prune_index=0,
relids=0x18720a8, initially_valid_subplans=0x7ffc6d56b198) at
execPartition.c:1891
#2 0x00000000007bc8d0 in ExecInitAppend (node=0x1863bb0, estate=0x1884ca0,
eflags=0) at nodeAppend.c:147
#3 0x00000000007a2300 in ExecInitNode (node=0x1863bb0, estate=0x1884ca0,
eflags=0) at execProcnode.c:182
#4 0x000000000079af67 in EvalPlanQualStart (epqstate=0x186adf8,
planTree=0x1863bb0) at execMain.c:3143
#5 0x000000000079a8ab in EvalPlanQualBegin (epqstate=0x186adf8) at
execMain.c:2930
#6 0x00000000007e18e1 in ExecDelete (context=0x7ffc6d56b480,
resultRelInfo=0x186af20, tupleid=0x7ffc6d56b402,
oldtuple=0x0, processReturning=true, changingPart=false, canSetTag=true,
tmresult=0x0, tupleDeleted=0x0,
epqreturnslot=0x0) at nodeModifyTable.c:1709
#7 0x00000000007e61e2 in ExecModifyTable (pstate=0x186ad10) at
nodeModifyTable.c:4518
#8 0x00000000007a29a1 in ExecProcNodeFirst (node=0x186ad10) at
execProcnode.c:469
#9 0x00000000007959cb in ExecProcNode (node=0x186ad10) at
../../../src/include/executor/executor.h:316
```
Likely cause: EvalPlanQualStart creates a new EState without setting
es_part_prune_infos.
On Thu, 18 Sept 2025 at 03:25, PG Bug reporting form
<noreply@postgresql.org> wrote:
I recently encountered a crash when executing a DELETE on PG18. The issue
can be reproduced on the HEAD branch with the following steps:
Thanks for the report.
The first bad commit is:
commit bb3ec16e14ded1d23a46d3c7e623a965164fa124
Author: Amit Langote <amitlan@postgresql.org>
Date: Thu Jan 30 11:57:32 2025 +0900
Move PartitionPruneInfo out of plan nodes into PlannedStmt
David
On Thu, 18 Sept 2025 at 09:43, David Rowley <dgrowleyml@gmail.com> wrote:
The first bad commit is:
commit bb3ec16e14ded1d23a46d3c7e623a965164fa124
Author: Amit Langote <amitlan@postgresql.org>
Date: Thu Jan 30 11:57:32 2025 +0900Move PartitionPruneInfo out of plan nodes into PlannedStmt
I think the attached is the correct fix. I also wonder if it's worth
an isolation test to exercise this code.
David
Attachments:
19056_fix.patchapplication/octet-stream; name=19056_fix.patchDownload+9-0
On Thu, Sep 18, 2025 at 7:32 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 18 Sept 2025 at 09:43, David Rowley <dgrowleyml@gmail.com> wrote:
The first bad commit is:
commit bb3ec16e14ded1d23a46d3c7e623a965164fa124
Author: Amit Langote <amitlan@postgresql.org>
Date: Thu Jan 30 11:57:32 2025 +0900Move PartitionPruneInfo out of plan nodes into PlannedStmt
I think the attached is the correct fix. I also wonder if it's worth
an isolation test to exercise this code.
Thanks for the patch, David, and for the report, Fei. I indeed forgot
to update EvalPlanQualStart() in that commit.
I agree about adding an isolation test, which I have done in the
attached updated patch.
--
Thanks, Amit Langote
Attachments:
v2-0001-Fix-EPQ-crash-from-missing-partition-pruning-stat.patchapplication/octet-stream; name=v2-0001-Fix-EPQ-crash-from-missing-partition-pruning-stat.patchDownload+45-2
On Sep 18, 2025, at 09:33, Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Sep 18, 2025 at 7:32 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 18 Sept 2025 at 09:43, David Rowley <dgrowleyml@gmail.com> wrote:
The first bad commit is:
commit bb3ec16e14ded1d23a46d3c7e623a965164fa124
Author: Amit Langote <amitlan@postgresql.org>
Date: Thu Jan 30 11:57:32 2025 +0900Move PartitionPruneInfo out of plan nodes into PlannedStmt
I think the attached is the correct fix. I also wonder if it's worth
an isolation test to exercise this code.Thanks for the patch, David, and for the report, Fei. I indeed forgot
to update EvalPlanQualStart() in that commit.I agree about adding an isolation test, which I have done in the
attached updated patch.
Thanks David and Amit for the patch. It looks OK to me, and isolation
testing confirms the issue. Also, should s1ppx and s2ppx in the case be
renamed to s4ppx and s5ppx?
Best Regards,
Fei Changhong
On Thu, Sep 18, 2025 at 10:58 feichanghong <feichanghong@qq.com> wrote:
On Sep 18, 2025, at 09:33, Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Sep 18, 2025 at 7:32 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 18 Sept 2025 at 09:43, David Rowley <dgrowleyml@gmail.com> wrote:
The first bad commit is:
commit bb3ec16e14ded1d23a46d3c7e623a965164fa124
Author: Amit Langote <amitlan@postgresql.org>
Date: Thu Jan 30 11:57:32 2025 +0900Move PartitionPruneInfo out of plan nodes into PlannedStmt
I think the attached is the correct fix. I also wonder if it's worth
an isolation test to exercise this code.Thanks for the patch, David, and for the report, Fei. I indeed forgot
to update EvalPlanQualStart() in that commit.I agree about adding an isolation test, which I have done in the
attached updated patch.Thanks David and Amit for the patch. It looks OK to me, and isolation
testing confirms the issue. Also, should s1ppx and s2ppx in the case be
renamed to s4ppx and s5ppx?
Good catch, you are right, will fix. Thanks.
Show quoted text
On Thu, Sep 18, 2025 at 12:10 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Sep 18, 2025 at 10:58 feichanghong <feichanghong@qq.com> wrote:
On Sep 18, 2025, at 09:33, Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Sep 18, 2025 at 7:32 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 18 Sept 2025 at 09:43, David Rowley <dgrowleyml@gmail.com> wrote:The first bad commit is:
commit bb3ec16e14ded1d23a46d3c7e623a965164fa124
Author: Amit Langote <amitlan@postgresql.org>
Date: Thu Jan 30 11:57:32 2025 +0900Move PartitionPruneInfo out of plan nodes into PlannedStmt
I think the attached is the correct fix. I also wonder if it's worth
an isolation test to exercise this code.Thanks for the patch, David, and for the report, Fei. I indeed forgot
to update EvalPlanQualStart() in that commit.I agree about adding an isolation test, which I have done in the
attached updated patch.Thanks David and Amit for the patch. It looks OK to me, and isolation
testing confirms the issue. Also, should s1ppx and s2ppx in the case be
renamed to s4ppx and s5ppx?Good catch, you are right, will fix. Thanks.
Fixed. Will push this barring objections.
--
Thanks, Amit Langote
Attachments:
v3-0001-Fix-EPQ-crash-from-missing-partition-pruning-stat.patchapplication/octet-stream; name=v3-0001-Fix-EPQ-crash-from-missing-partition-pruning-stat.patchDownload+45-2
On Thu, 18 Sept 2025 at 15:37, Amit Langote <amitlangote09@gmail.com> wrote:
Fixed. Will push this barring objections.
Thanks for adding the test case.
+# Test that EState.es_part_prune_infos is properly set in EvalPlanQualStart() +# Bug #19056
I don't think it's that useful to note down the bug number that caused
that test to be added. I think it'd be better to write something like:
"Exercise run-time partition pruning code in an EPQ plan"
If someone really wants to know why the test was added, git blame
linking back to the discussion in the commit message is the normal
workflow.
David
David Rowley <dgrowleyml@gmail.com> writes:
On Thu, 18 Sept 2025 at 15:37, Amit Langote <amitlangote09@gmail.com> wrote:
+# Test that EState.es_part_prune_infos is properly set in EvalPlanQualStart() +# Bug #19056
I don't think it's that useful to note down the bug number that caused
that test to be added.
We're inconsistent about whether we do that or not, but it's
far from un-heard-of. I just today pushed a patch in which
I did mention the bug# in the test case [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b0cc0a71e0a0a760f54c72edb8cd000e4555442b, and I did so
mostly because the adjacent test case had a similar comment.
So I see no reason to object to Amit's usage.
I think it'd be better to write something like:
"Exercise run-time partition pruning code in an EPQ plan"
Not expressing an opinion about whether that's better or
worse than Amit's lede.
regards, tom lane
On Thu, 18 Sept 2025 at 16:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
I don't think it's that useful to note down the bug number that caused
that test to be added.We're inconsistent about whether we do that or not, but it's
far from un-heard-of. I just today pushed a patch in which
I did mention the bug# in the test case [1], and I did so
mostly because the adjacent test case had a similar comment.
So I see no reason to object to Amit's usage.
The issue was introduced in v18 dev cycle, so it's never been a
problem in any production build of Postgres. I could get more on board
with an argument for noting these down if it were some long-standing
well known issue that had been around for several years which we
debated how to fix, and finally did. This isn't that, so IMO, noting
down the bug number is pretty pointless.
David
On Thu, 18 Sept 2025 at 16:19, David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 18 Sept 2025 at 16:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
We're inconsistent about whether we do that or not, but it's
far from un-heard-of. I just today pushed a patch in which
I did mention the bug# in the test case [1], and I did so
mostly because the adjacent test case had a similar comment.
So I see no reason to object to Amit's usage.The issue was introduced in v18 dev cycle, so it's never been a
problem in any production build of Postgres. I could get more on board
with an argument for noting these down if it were some long-standing
well known issue that had been around for several years which we
debated how to fix, and finally did. This isn't that, so IMO, noting
down the bug number is pretty pointless.
Just further to that, while I'm on a rant. It's just not a
sustainable practice to note down the bug numbers. If we did that for
every fix the code and tests would be strewn with random and pretty
meaningless bug numbers. It wouldn't inspire much confidence to the
casual reader of the code either as it would appear mostly to be a
patchwork of fixes. We do have a reputation for good quality. I don't
think we need to put up signs anywhere that indicate mistakes once
existed here, especially for ones that existed in no released version
of PostgreSQL.
David
David Rowley <dgrowleyml@gmail.com> writes:
... I don't
think we need to put up signs anywhere that indicate mistakes once
existed here, especially for ones that existed in no released version
of PostgreSQL.
I'm a bit bemused by that viewpoint. There's an enormous fraction of
our test suite that is exactly memorializing bugs that once existed.
Maybe there is some reason to distinguish bugs that never made it
into an official release, but that seems rather weak to me.
regards, tom lane
On Thu, Sep 18, 2025 at 1:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
On Thu, 18 Sept 2025 at 15:37, Amit Langote <amitlangote09@gmail.com> wrote:
+# Test that EState.es_part_prune_infos is properly set in EvalPlanQualStart() +# Bug #19056I don't think it's that useful to note down the bug number that caused
that test to be added.We're inconsistent about whether we do that or not, but it's
far from un-heard-of. I just today pushed a patch in which
I did mention the bug# in the test case [1], and I did so
mostly because the adjacent test case had a similar comment.
So I see no reason to object to Amit's usage.
I was just mimicking a few other "cf bug #" mentions in
eval-plan-qual.spec, but I'm fine to take it out if we'd prefer to
reduce that. Git blame is enough.
I think it'd be better to write something like:
"Exercise run-time partition pruning code in an EPQ plan"Not expressing an opinion about whether that's better or
worse than Amit's lede.
What I added is:
# Test that EState.es_part_prune_infos is properly set in EvalPlanQualStart()
I'm fine to change the comment to David's suggestion since that makes
the test description less narrowly tied to one fix of one specific
issue in that path.
--
Thanks, Amit Langote
On Thu, 18 Sept 2025 at 16:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
... I don't
think we need to put up signs anywhere that indicate mistakes once
existed here, especially for ones that existed in no released version
of PostgreSQL.I'm a bit bemused by that viewpoint. There's an enormous fraction of
our test suite that is exactly memorializing bugs that once existed.
Maybe there is some reason to distinguish bugs that never made it
into an official release, but that seems rather weak to me.
Not intending to sound nasty here, but if you're going to convince me
that I'm off track in my line of thought, it's going to have to be
something more convincing than "I just did this today mostly because I
copied an existing adjacent test case".
In my view, the only sense in writing down bug numbers in tests or
code is if there's some chance that we might get tricked into putting
it back the broken way again. Otherwise they just act as tombstones to
the should-be-forgotten past. It's hard to imagine someone being
convinced to rebreak this issue and delete Amit's new test.
Segfaulting is clearly wrong behaviour. Nobody will be convinced
otherwise.
David
On Thu, Sep 18, 2025 at 1:56 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Sep 18, 2025 at 1:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
On Thu, 18 Sept 2025 at 15:37, Amit Langote <amitlangote09@gmail.com> wrote:
+# Test that EState.es_part_prune_infos is properly set in EvalPlanQualStart() +# Bug #19056I don't think it's that useful to note down the bug number that caused
that test to be added.We're inconsistent about whether we do that or not, but it's
far from un-heard-of. I just today pushed a patch in which
I did mention the bug# in the test case [1], and I did so
mostly because the adjacent test case had a similar comment.
So I see no reason to object to Amit's usage.I was just mimicking a few other "cf bug #" mentions in
eval-plan-qual.spec, but I'm fine to take it out if we'd prefer to
reduce that. Git blame is enough.I think it'd be better to write something like:
"Exercise run-time partition pruning code in an EPQ plan"Not expressing an opinion about whether that's better or
worse than Amit's lede.What I added is:
# Test that EState.es_part_prune_infos is properly set in EvalPlanQualStart()
I'm fine to change the comment to David's suggestion since that makes
the test description less narrowly tied to one fix of one specific
issue in that path.
Patch updated.
--
Thanks, Amit Langote
Attachments:
v4-0001-Fix-EPQ-crash-from-missing-partition-pruning-stat.patchapplication/octet-stream; name=v4-0001-Fix-EPQ-crash-from-missing-partition-pruning-stat.patchDownload+44-2
On Sep 18, 2025, at 14:09, Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Sep 18, 2025 at 1:56 PM Amit Langote <amitlangote09@gmail.com <mailto:amitlangote09@gmail.com>> wrote:
On Thu, Sep 18, 2025 at 1:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
On Thu, 18 Sept 2025 at 15:37, Amit Langote <amitlangote09@gmail.com> wrote:
+# Test that EState.es_part_prune_infos is properly set in EvalPlanQualStart() +# Bug #19056I don't think it's that useful to note down the bug number that caused
that test to be added.We're inconsistent about whether we do that or not, but it's
far from un-heard-of. I just today pushed a patch in which
I did mention the bug# in the test case [1], and I did so
mostly because the adjacent test case had a similar comment.
So I see no reason to object to Amit's usage.I was just mimicking a few other "cf bug #" mentions in
eval-plan-qual.spec, but I'm fine to take it out if we'd prefer to
reduce that. Git blame is enough.I think it'd be better to write something like:
"Exercise run-time partition pruning code in an EPQ plan"Not expressing an opinion about whether that's better or
worse than Amit's lede.What I added is:
# Test that EState.es_part_prune_infos is properly set in EvalPlanQualStart()
I'm fine to change the comment to David's suggestion since that makes
the test description less narrowly tied to one fix of one specific
issue in that path.Patch updated.
+# EState.es_part_prune_infos bug #19056
There is still bug #19056 here.
Best Regards,
Fei Changhong
On Thu, Sep 18, 2025 at 3:15 PM feichanghong <feichanghong@qq.com> wrote:
On Sep 18, 2025, at 14:09, Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Sep 18, 2025 at 1:56 PM Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Sep 18, 2025 at 1:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
On Thu, 18 Sept 2025 at 15:37, Amit Langote <amitlangote09@gmail.com> wrote:
+# Test that EState.es_part_prune_infos is properly set in EvalPlanQualStart() +# Bug #19056I don't think it's that useful to note down the bug number that caused
that test to be added.We're inconsistent about whether we do that or not, but it's
far from un-heard-of. I just today pushed a patch in which
I did mention the bug# in the test case [1], and I did so
mostly because the adjacent test case had a similar comment.
So I see no reason to object to Amit's usage.I was just mimicking a few other "cf bug #" mentions in
eval-plan-qual.spec, but I'm fine to take it out if we'd prefer to
reduce that. Git blame is enough.I think it'd be better to write something like:
"Exercise run-time partition pruning code in an EPQ plan"Not expressing an opinion about whether that's better or
worse than Amit's lede.What I added is:
# Test that EState.es_part_prune_infos is properly set in EvalPlanQualStart()
I'm fine to change the comment to David's suggestion since that makes
the test description less narrowly tied to one fix of one specific
issue in that path.Patch updated.
+# EState.es_part_prune_infos bug #19056
Oops, coffee deficiency. Updated again.
--
Thanks, Amit Langote
Attachments:
v5-0001-Fix-EPQ-crash-from-missing-partition-pruning-stat.patchapplication/octet-stream; name=v5-0001-Fix-EPQ-crash-from-missing-partition-pruning-stat.patchDownload+44-2
On Thu, 18 Sept 2025 at 18:20, Amit Langote <amitlangote09@gmail.com> wrote:
Oops, coffee deficiency. Updated again.
Thanks for updating the patch.
I think the original intent of "Parsed test spec with 3 sessions"
(from 759d9d676) was two sessions doing work, and an independent
observer session. Is there a reason to add 2 more sessions? Maybe it's
me not working with the isolation tester often enough, but I'd have
expected you to add steps for s1 and s2 then define permutations for
those steps.
David
On Thu, Sep 18, 2025 at 4:06 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 18 Sept 2025 at 18:20, Amit Langote <amitlangote09@gmail.com> wrote:
Oops, coffee deficiency. Updated again.
Thanks for updating the patch.
I think the original intent of "Parsed test spec with 3 sessions"
(from 759d9d676) was two sessions doing work, and an independent
observer session. Is there a reason to add 2 more sessions? Maybe it's
me not working with the isolation tester often enough, but I'd have
expected you to add steps for s1 and s2 then define permutations for
those steps.
Yeah, that makes the additions smaller and avoids slowing down the
suite. It did take a bit of fiddling though -- after switching to use
the existing sessions, teardown in s1 hung because I hadn’t noticed
that s2’s setup already does a BEGIN, so the delete side ended up
without a COMMIT, leaving an open transaction that blocked DROP in
s1’s teardown. Easier to take shortcuts with new sessions like I
first did, but better to keep the suite light. :-)
Done in the attached.
--
Thanks, Amit Langote
Attachments:
v6-0001-Fix-EPQ-crash-from-missing-partition-pruning-stat.patchapplication/octet-stream; name=v6-0001-Fix-EPQ-crash-from-missing-partition-pruning-stat.patchDownload+26-1
On Thu, 18 Sept 2025 at 20:25, Amit Langote <amitlangote09@gmail.com> wrote:
On Thu, Sep 18, 2025 at 4:06 PM David Rowley <dgrowleyml@gmail.com> wrote:
I think the original intent of "Parsed test spec with 3 sessions"
(from 759d9d676) was two sessions doing work, and an independent
observer session. Is there a reason to add 2 more sessions? Maybe it's
me not working with the isolation tester often enough, but I'd have
expected you to add steps for s1 and s2 then define permutations for
those steps.Yeah, that makes the additions smaller and avoids slowing down the
suite. It did take a bit of fiddling though -- after switching to use
the existing sessions, teardown in s1 hung because I hadn’t noticed
that s2’s setup already does a BEGIN, so the delete side ended up
without a COMMIT, leaving an open transaction that blocked DROP in
s1’s teardown. Easier to take shortcuts with new sessions like I
first did, but better to keep the suite light. :-)Done in the attached.
Thanks for updating. Looks good. I verified the test fails with the
code change reverted and passes with the change.
David