wrong Append/MergeAppend elision?
Hi,
It seems that the planner currently elides an Append/MergeAppend that
has run-time pruning info (part_prune_index) set, but which I think is
a bug. Here's an example:
create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
set plan_cache_mode to force_generic_plan ;
prepare q as select * from p where a = $1;
explain execute q (0);
QUERY PLAN
------------------------------------------------------
Seq Scan on p1 p (cost=0.00..41.88 rows=13 width=4)
Filter: (a = $1)
(2 rows)
Because the Append is elided in this case, run-time pruning doesn't
kick in to prune p1, even though PartitionPruneInfo to do so has been
generated.
Attached find a patch to fix that. There are some expected output
diffs in partition_prune suite, though they all look sane to me.
Thoughts?
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v1-0001-Don-t-elide-Append-MergeAppend-if-run-time-prunin.patchapplication/octet-stream; name=v1-0001-Don-t-elide-Append-MergeAppend-if-run-time-prunin.patchDownload
From e2ee89a7e36641bd0386a8c7da5e5ebfebc78f92 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 26 Jan 2023 21:10:19 +0900
Subject: [PATCH v1] Don't elide Append/MergeAppend if run-time pruning present
---
src/backend/optimizer/plan/setrefs.c | 27 +++---
src/test/regress/expected/partition_prune.out | 87 ++++++++++++-------
src/test/regress/sql/partition_prune.sql | 8 ++
3 files changed, 80 insertions(+), 42 deletions(-)
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 85ba9d1ca1..4e582afb0a 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1708,13 +1708,14 @@ set_append_references(PlannerInfo *root,
/*
* See if it's safe to get rid of the Append entirely. For this to be
- * safe, there must be only one child plan and that child plan's parallel
- * awareness must match the Append's. The reason for the latter is that
- * if the Append is parallel aware and the child is not, then the calling
- * plan may execute the non-parallel aware child multiple times. (If you
+ * safe, there must be only one child plan and no run-time pruning info
+ * must have been set. Also, the only child plan's parallel awareness
+ * must match the Append's. The reason for the latter is that if the
+ * Append is parallel aware and the child is not, then the calling plan
+ * may execute the non-parallel aware child multiple times. (If you
* change these rules, update create_append_path to match.)
*/
- if (list_length(aplan->appendplans) == 1)
+ if (list_length(aplan->appendplans) == 1 && aplan->part_prune_index < 0)
{
Plan *p = (Plan *) linitial(aplan->appendplans);
@@ -1773,15 +1774,15 @@ set_mergeappend_references(PlannerInfo *root,
}
/*
- * See if it's safe to get rid of the MergeAppend entirely. For this to
- * be safe, there must be only one child plan and that child plan's
- * parallel awareness must match the MergeAppend's. The reason for the
- * latter is that if the MergeAppend is parallel aware and the child is
- * not, then the calling plan may execute the non-parallel aware child
- * multiple times. (If you change these rules, update
- * create_merge_append_path to match.)
+ * See if it's safe to get rid of the MergeAppend entirely. For this to be
+ * safe, there must be only one child plan and no run-time pruning info
+ * must have been set. Also, the only child plan's parallel awareness must
+ * match the MergeAppend's. The reason for the latter is that if the
+ * MergeAppend is parallel aware and the child is not, then the calling
+ * plan may execute the non-parallel aware child multiple times. (If you
+ * change these rules, update create_merge_append_path to match.)
*/
- if (list_length(mplan->mergeplans) == 1)
+ if (list_length(mplan->mergeplans) == 1 && mplan->part_prune_index < 0)
{
Plan *p = (Plan *) linitial(mplan->mergeplans);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 7555764c77..9554559215 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1935,6 +1935,30 @@ explain (analyze, costs off, summary off, timing off) select * from list_part wh
(13 rows)
rollback;
+-- Test the case where an Append with only one run-time prunable partition
+-- must not be elided
+explain (analyze, costs off, summary off, timing off)
+ select * from list_part where a > 3 and a = (select 4);
+ QUERY PLAN
+------------------------------------------------------------------
+ Append (actual rows=1 loops=1)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=1 loops=1)
+ -> Seq Scan on list_part4 list_part_1 (actual rows=1 loops=1)
+ Filter: ((a > 3) AND (a = $0))
+(5 rows)
+
+explain (analyze, costs off, summary off, timing off)
+ select * from list_part where a > 3 and a = (select 5);
+ QUERY PLAN
+-----------------------------------------------------------
+ Append (actual rows=0 loops=1)
+ InitPlan 1 (returns $0)
+ -> Result (actual rows=1 loops=1)
+ -> Seq Scan on list_part4 list_part_1 (never executed)
+ Filter: ((a > 3) AND (a = $0))
+(5 rows)
+
drop table list_part;
-- Parallel append
-- Parallel queries won't necessarily get as many workers as the planner
@@ -2791,11 +2815,12 @@ prepare part_abc_q1 (int, int, int) as
select * from part_abc where a = $1 and b = $2 and c = $3;
-- Single partition should be scanned.
explain (analyze, costs off, summary off, timing off) execute part_abc_q1 (1, 2, 3);
- QUERY PLAN
-----------------------------------------------------------
- Seq Scan on part_abc_p1 part_abc (actual rows=0 loops=1)
- Filter: ((a = $1) AND (b = $2) AND (c = $3))
-(2 rows)
+ QUERY PLAN
+------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+ -> Seq Scan on part_abc_p1 part_abc_1 (actual rows=0 loops=1)
+ Filter: ((a = $1) AND (b = $2) AND (c = $3))
+(3 rows)
deallocate part_abc_q1;
drop table part_abc;
@@ -3609,13 +3634,14 @@ create table listp2 partition of listp for values in(2) partition by list(b);
create table listp2_10 partition of listp2 for values in (10);
explain (analyze, costs off, summary off, timing off)
select * from listp where a = (select 2) and b <> 10;
- QUERY PLAN
---------------------------------------------------
- Seq Scan on listp1 listp (actual rows=0 loops=1)
- Filter: ((b <> 10) AND (a = $0))
+ QUERY PLAN
+---------------------------------------------------
+ Append (actual rows=0 loops=1)
InitPlan 1 (returns $0)
- -> Result (never executed)
-(4 rows)
+ -> Result (actual rows=1 loops=1)
+ -> Seq Scan on listp1 listp_1 (never executed)
+ Filter: ((b <> 10) AND (a = $0))
+(5 rows)
--
-- check that a partition directly accessed in a query is excluded with
@@ -3674,8 +3700,8 @@ alter table listp_12_1 set (parallel_workers = 0);
-- Ensure that listp_12_2 is not scanned. (The nested Append is not seen in
-- the plan as it's pulled in setref.c due to having just a single subnode).
select explain_parallel_append('select * from listp where a = (select 1);');
- explain_parallel_append
-----------------------------------------------------------------------
+ explain_parallel_append
+----------------------------------------------------------------------------
Gather (actual rows=N loops=N)
Workers Planned: 2
Params Evaluated: $0
@@ -3683,11 +3709,12 @@ select explain_parallel_append('select * from listp where a = (select 1);');
InitPlan 1 (returns $0)
-> Result (actual rows=N loops=N)
-> Parallel Append (actual rows=N loops=N)
- -> Seq Scan on listp_12_1 listp_1 (actual rows=N loops=N)
- Filter: (a = $0)
- -> Parallel Seq Scan on listp_12_2 listp_2 (never executed)
- Filter: (a = $0)
-(11 rows)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on listp_12_1 listp_2 (actual rows=N loops=N)
+ Filter: (a = $0)
+ -> Parallel Seq Scan on listp_12_2 listp_3 (never executed)
+ Filter: (a = $0)
+(12 rows)
-- Like the above but throw some more complexity at the planner by adding
-- a UNION ALL. We expect both sides of the union not to scan the
@@ -3696,8 +3723,8 @@ select explain_parallel_append(
'select * from listp where a = (select 1)
union all
select * from listp where a = (select 2);');
- explain_parallel_append
------------------------------------------------------------------------------------
+ explain_parallel_append
+-----------------------------------------------------------------------------------------
Append (actual rows=N loops=N)
-> Gather (actual rows=N loops=N)
Workers Planned: 2
@@ -3706,10 +3733,11 @@ select * from listp where a = (select 2);');
InitPlan 1 (returns $0)
-> Result (actual rows=N loops=N)
-> Parallel Append (actual rows=N loops=N)
- -> Seq Scan on listp_12_1 listp_1 (actual rows=N loops=N)
- Filter: (a = $0)
- -> Parallel Seq Scan on listp_12_2 listp_2 (never executed)
- Filter: (a = $0)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on listp_12_1 listp_2 (actual rows=N loops=N)
+ Filter: (a = $0)
+ -> Parallel Seq Scan on listp_12_2 listp_3 (never executed)
+ Filter: (a = $0)
-> Gather (actual rows=N loops=N)
Workers Planned: 2
Params Evaluated: $1
@@ -3717,11 +3745,12 @@ select * from listp where a = (select 2);');
InitPlan 2 (returns $1)
-> Result (actual rows=N loops=N)
-> Parallel Append (actual rows=N loops=N)
- -> Seq Scan on listp_12_1 listp_4 (never executed)
- Filter: (a = $1)
- -> Parallel Seq Scan on listp_12_2 listp_5 (actual rows=N loops=N)
- Filter: (a = $1)
-(23 rows)
+ -> Parallel Append (actual rows=N loops=N)
+ -> Seq Scan on listp_12_1 listp_6 (never executed)
+ Filter: (a = $1)
+ -> Parallel Seq Scan on listp_12_2 listp_7 (actual rows=N loops=N)
+ Filter: (a = $1)
+(25 rows)
drop table listp;
reset parallel_tuple_cost;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d70bd8610c..560aa0ccc0 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -439,6 +439,14 @@ explain (analyze, costs off, summary off, timing off) select * from list_part wh
rollback;
+-- Test the case where an Append with only one run-time prunable partition
+-- must not be elided
+explain (analyze, costs off, summary off, timing off)
+ select * from list_part where a > 3 and a = (select 4);
+explain (analyze, costs off, summary off, timing off)
+ select * from list_part where a > 3 and a = (select 5);
+
+
drop table list_part;
-- Parallel append
--
2.35.3
On Fri, 27 Jan 2023 at 01:30, Amit Langote <amitlangote09@gmail.com> wrote:
It seems that the planner currently elides an Append/MergeAppend that
has run-time pruning info (part_prune_index) set, but which I think is
a bug.
This is actually how I intended it to work. Whether it was a good idea
or not, I'm currently unsure. I mentioned it in [1]/messages/by-id/CAKJS1f_utf1Mbp8UeoByAarziO4e4qb4Z8FksurpaM+3Q_HOmQ@mail.gmail.com.
I think the plan shapes I was talking about were some ordered paths
from partial paths per what is being added right at the end of
add_paths_to_append_rel(). However, now that I look at it again, I'm
not sure why it wouldn't be correct to still have those paths with a
single-child Append. Certainly, the "if (list_length(live_childrels)
== 1)" test made in add_paths_to_append_rel() is no longer aligned to
the equivalent test in set_append_references(), so it's possible even
now that we make a plan that uses the extra sorted partial paths added
in add_paths_to_append_rel() and still have the Append in the final
plan.
There is still the trade-off of having to pull tuples through the
Append node for when run-time pruning is unable to prune the last
partition. So your proposal to leave the Append alone when there's
run-time pruning info is certainly not a no-brainer.
[1]: /messages/by-id/CAKJS1f_utf1Mbp8UeoByAarziO4e4qb4Z8FksurpaM+3Q_HOmQ@mail.gmail.com
David Rowley <dgrowleyml@gmail.com> writes:
On Fri, 27 Jan 2023 at 01:30, Amit Langote <amitlangote09@gmail.com> wrote:
It seems that the planner currently elides an Append/MergeAppend that
has run-time pruning info (part_prune_index) set, but which I think is
a bug.
There is still the trade-off of having to pull tuples through the
Append node for when run-time pruning is unable to prune the last
partition. So your proposal to leave the Append alone when there's
run-time pruning info is certainly not a no-brainer.
Yeah. Amit's proposal amounts to optimizing for the case that all
partitions get pruned, which does not seem to me to be the way
to bet. I'm inclined to think it's fine as-is.
regards, tom lane
On Fri, Jan 27, 2023 at 5:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
On Fri, 27 Jan 2023 at 01:30, Amit Langote <amitlangote09@gmail.com> wrote:
It seems that the planner currently elides an Append/MergeAppend that
has run-time pruning info (part_prune_index) set, but which I think is
a bug.There is still the trade-off of having to pull tuples through the
Append node for when run-time pruning is unable to prune the last
partition. So your proposal to leave the Append alone when there's
run-time pruning info is certainly not a no-brainer.Yeah. Amit's proposal amounts to optimizing for the case that all
partitions get pruned, which does not seem to me to be the way
to bet. I'm inclined to think it's fine as-is.
Fair enough. I thought for a second that maybe it was simply an
oversight but David confirms otherwise. This was interacting badly
with the other patch I'm working on and I just figured out the problem
was with that other patch.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com