Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

Started by Phil Florentover 7 years ago48 messages
#1Phil Florent
philflorent@hotmail.com

Hi,

I obtained an XX000 error testing my DSS application with PostgreSQL 11 beta 1.

Here is a simplified version of my test, no data in the tables :

-- 11
select version();
version

-----------------------------------------------------------------------------------------------------------------------

PostgreSQL 11beta1 (Debian 11~beta1-2.pgdg+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 7.3.0-19) 7.3.0, 64-bit
(1 ligne)

-- connected superuser -- postgres
create user a password 'a';
create schema a authorization a;
create user b password 'b';
create schema b authorization b;
create user c password 'c';
create schema c authorization c;
create user z password 'z';
create schema z authorization z;

-- connected a
create table t1(k1 timestamp, c1 int);
create view v as select k1, c1 from t1;
grant usage on schema a to z;
grant select on all tables in schema a to z;

-- connected b
create table t2(k1 timestamp, c1 int) partition by range(k1);
create table t2_2016 partition of t2 for values from ('2016-01-01') to ('2017-01-01');
create table t2_2017 partition of t2 for values from ('2017-01-01') to ('2018-01-01');
create table t2_2018 partition of t2 for values from ('2018-01-01') to ('2019-01-01');
create view v as select k1, c1 from t2;
grant select on all tables in schema b to z;
grant usage on schema b to z;

-- connected c
create table t3(k1 timestamp, c1 int) partition by range(k1);
create table t3_2016 partition of t3 for values from ('2016-01-01') to ('2017-01-01');
create table t3_2017 partition of t3 for values from ('2017-01-01') to ('2018-01-01');
create table t3_2018 partition of t3 for values from ('2018-01-01') to ('2019-01-01');
create view v as select k1, c1 from t3;
grant select on all tables in schema c to z;
grant usage on schema c to z;

-- connected z
create view v as
select k1, c1 from
(select * from a.v
UNION ALL
select * from b.v
UNION ALL
select * from c.v) vabc ;

explain analyze select * from v where v.k1 > date '2017-01-01';
ERREUR: XX000: did not find all requested child rels in append_rel_list
EMPLACEMENT : find_appinfos_by_relids, prepunion.c : 2643

set enable_partition_pruning=off;
SET

explain analyze select * from v where v.k1 > date '2017-01-01';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Append (cost=0.00..272.30 rows=4760 width=12) (actual time=0.217..0.217 rows=0 loops=1)
-> Seq Scan on t1 (cost=0.00..35.50 rows=680 width=12) (actual time=0.020..0.020 rows=0 loops=1)
Filter: (k1 > '2017-01-01'::date)
-> Seq Scan on t2_2016 (cost=0.00..35.50 rows=680 width=12) (actual time=0.035..0.035 rows=0 loops=1)
Filter: (k1 > '2017-01-01'::date)
-> Seq Scan on t2_2017 (cost=0.00..35.50 rows=680 width=12) (actual time=0.016..0.016 rows=0 loops=1)
Filter: (k1 > '2017-01-01'::date)
-> Seq Scan on t2_2018 (cost=0.00..35.50 rows=680 width=12) (actual time=0.015..0.015 rows=0 loops=1)
Filter: (k1 > '2017-01-01'::date)
-> Seq Scan on t3_2016 (cost=0.00..35.50 rows=680 width=12) (actual time=0.040..0.040 rows=0 loops=1)
Filter: (k1 > '2017-01-01'::date)
-> Seq Scan on t3_2017 (cost=0.00..35.50 rows=680 width=12) (actual time=0.016..0.016 rows=0 loops=1)
Filter: (k1 > '2017-01-01'::date)
-> Seq Scan on t3_2018 (cost=0.00..35.50 rows=680 width=12) (actual time=0.016..0.016 rows=0 loops=1)
Filter: (k1 > '2017-01-01'::date)
Planning Time: 0.639 ms
Execution Time: 0.400 ms

set enable_partition_pruning=on;
SET

explain analyze select * from v where v.k1 > date '2017-01-01';
ERREUR: XX000: did not find all requested child rels in append_rel_list
EMPLACEMENT : find_appinfos_by_relids, prepunion.c : 2643

-- 10
select version();
version
--------------------------------------------------------------------------------------------------------------------------------------------

PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit
(1 ligne)

-- connected superuser -- postgres
create user a password 'a';
create schema a authorization a;

create user b password 'b';
create schema b authorization b;

create user c password 'c';
create schema c authorization c;

create user z password 'z';
create schema z authorization z;

-- connected a
create table t1(k1 timestamp, c1 int);
create view v as select k1, c1 from t1;
grant usage on schema a to z;
grant select on all tables in schema a to z;

-- connected b
create table t2(k1 timestamp, c1 int) partition by range(k1);
create table t2_2016 partition of t2 for values from ('2016-01-01') to ('2017-01-01');
create table t2_2017 partition of t2 for values from ('2017-01-01') to ('2018-01-01');
create table t2_2018 partition of t2 for values from ('2018-01-01') to ('2019-01-01');
create view v as select k1, c1 from t2;
grant select on all tables in schema b to z;
grant usage on schema b to z;

-- connected c
create table t3(k1 timestamp, c1 int) partition by range(k1);
create table t3_2016 partition of t3 for values from ('2016-01-01') to ('2017-01-01');
create table t3_2017 partition of t3 for values from ('2017-01-01') to ('2018-01-01');
create table t3_2018 partition of t3 for values from ('2018-01-01') to ('2019-01-01');
create view v as select k1, c1 from t3;
grant select on all tables in schema c to z;
grant usage on schema c to z;

-- connected z
create view v as
select k1, c1 from
(select * from a.v
UNION ALL
select * from b.v
UNION ALL
select * from c.v) vabc ;

explain analyze select * from v where v.k1 > date '2017-01-01';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Append (cost=0.00..177.50 rows=3400 width=12) (actual time=0.206..0.206 rows=0 loops=1)
-> Seq Scan on t1 (cost=0.00..35.50 rows=680 width=12) (actual time=0.044..0.044 rows=0 loops=1)
Filter: (k1 > '2017-01-01'::date)
-> Seq Scan on t2_2017 (cost=0.00..35.50 rows=680 width=12) (actual time=0.020..0.020 rows=0 loops=1)
Filter: (k1 > '2017-01-01'::date)
-> Seq Scan on t2_2018 (cost=0.00..35.50 rows=680 width=12) (actual time=0.020..0.020 rows=0 loops=1)
Filter: (k1 > '2017-01-01'::date)
-> Seq Scan on t3_2017 (cost=0.00..35.50 rows=680 width=12) (actual time=0.036..0.036 rows=0 loops=1)
Filter: (k1 > '2017-01-01'::date)
-> Seq Scan on t3_2018 (cost=0.00..35.50 rows=680 width=12) (actual time=0.020..0.020 rows=0 loops=1)
Filter: (k1 > '2017-01-01'::date)
Planning time: 0.780 ms
Execution time: 0.427 ms

Best regards

Phil

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Phil Florent (#1)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

Phil Florent <philflorent@hotmail.com> writes:

explain analyze select * from v where v.k1 > date '2017-01-01';
ERREUR: XX000: did not find all requested child rels in append_rel_list
EMPLACEMENT : find_appinfos_by_relids, prepunion.c : 2643

Reproduced here, thanks for the report! This is very timely since
we were just in process of rewriting that code anyway ...

regards, tom lane

#3David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 9 June 2018 at 04:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Phil Florent <philflorent@hotmail.com> writes:

explain analyze select * from v where v.k1 > date '2017-01-01';
ERREUR: XX000: did not find all requested child rels in append_rel_list
EMPLACEMENT : find_appinfos_by_relids, prepunion.c : 2643

Reproduced here, thanks for the report! This is very timely since
we were just in process of rewriting that code anyway ...

Yeah. Thanks for the report Phil.

It looks like this was 499be013de6, which was one of mine.

A more simple case to reproduce is:

drop table listp;
create table listp (a int, b int) partition by list(a);
create table listp1 partition of listp for values in (1);
select * from (select * from listp union all select * from listp) t where a = 1;

I'll look in more detail after sleeping.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#4David Rowley
david.rowley@2ndquadrant.com
In reply to: David Rowley (#3)
1 attachment(s)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 9 June 2018 at 06:50, David Rowley <david.rowley@2ndquadrant.com> wrote:

It looks like this was 499be013de6, which was one of mine.

A more simple case to reproduce is:

drop table listp;
create table listp (a int, b int) partition by list(a);
create table listp1 partition of listp for values in (1);
select * from (select * from listp union all select * from listp) t where a = 1;

I'll look in more detail after sleeping.

So it looks like I've assumed that the Append path's partitioned_rels
will only ever be set for partitioned tables, but it can, in fact, be
set for UNION ALL parents too when the union children are partitioned
tables.

As a discussion topic, I've attached a patch which does resolve the
error, but it also disables run-time pruning in this case.

There might be some way we can treat UNION ALL parents differently
when building the PartitionPruneInfos. I've just not thought of what
this would be just yet. If I can't think of that, I wonder if this is
a rare enough case not to bother with run-time partition pruning.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

run-time_prune_only_for_partitioned_tables.patchapplication/octet-stream; name=run-time_prune_only_for_partitioned_tables.patchDownload
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ca2e0527db..fad14da892 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1079,7 +1079,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 
 	if (enable_partition_pruning &&
 		rel->reloptkind == RELOPT_BASEREL &&
-		best_path->partitioned_rels != NIL)
+		best_path->partitioned_rels != NIL &&
+		root->simple_rte_array[rel->relid]->relkind == RELKIND_PARTITIONED_TABLE)
 	{
 		List	   *prunequal;
 
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#4)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

David Rowley <david.rowley@2ndquadrant.com> writes:

So it looks like I've assumed that the Append path's partitioned_rels
will only ever be set for partitioned tables, but it can, in fact, be
set for UNION ALL parents too when the union children are partitioned
tables.

As a discussion topic, I've attached a patch which does resolve the
error, but it also disables run-time pruning in this case.

There might be some way we can treat UNION ALL parents differently
when building the PartitionPruneInfos. I've just not thought of what
this would be just yet. If I can't think of that, I wonder if this is
a rare enough case not to bother with run-time partition pruning.

So, IIUC, the issue is that for partitioning cases Append expects *all*
its children to be partitions of the *same* partitioned table? That
is, you could also break it with

select * from partitioned_table_a
union all
select * from partitioned_table_b

?

If so, maybe the best solution is to not allow a partitioning appendrel
to be flattened into an appendrel generated in other ways (particularly,
via UNION ALL). I also wonder whether it was a bad idea to treat these
as the same kind of path/plan in the first place.

regards, tom lane

#6David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 10 June 2018 at 04:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So, IIUC, the issue is that for partitioning cases Append expects *all*
its children to be partitions of the *same* partitioned table? That
is, you could also break it with

select * from partitioned_table_a
union all
select * from partitioned_table_b

?

Not quite. I think what I sent above is the most simple way to break
it. Your case won't because there are no quals to prune with, so
run-time pruning is never attempted.

If so, maybe the best solution is to not allow a partitioning appendrel
to be flattened into an appendrel generated in other ways (particularly,
via UNION ALL). I also wonder whether it was a bad idea to treat these
as the same kind of path/plan in the first place.

That might be the best idea. I'll look into that now. The only
drawback I see is that we'll end up pulling tuples through more Append
nodes in cases like you mentioned above.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#6)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

David Rowley <david.rowley@2ndquadrant.com> writes:

On 10 June 2018 at 04:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So, IIUC, the issue is that for partitioning cases Append expects *all*
its children to be partitions of the *same* partitioned table? That
is, you could also break it with

select * from partitioned_table_a
union all
select * from partitioned_table_b

?

Not quite. I think what I sent above is the most simple way to break
it. Your case won't because there are no quals to prune with, so
run-time pruning is never attempted.

Well, I hadn't bothered to put in the extra code needed to have a qual
to prune with, but my point remains that it doesn't seem like the current
Append code is prepared to cope with an Append that contains partitions
of more than one top-level partitioned table.

I just had a thought that might lead to a nice solution to that, or
might be totally crazy. What if we inverted the sense of the bitmaps
that track partition pruning state, so that instead of a bitmap of
valid partitions that need to be scanned, we had a bitmap of pruned
partitions that we know we don't need to scan? (The indexes of this
bitmap would be subplan indexes not partition indexes.) With this
representation, it doesn't matter if some of the Append's children
are not supposed to participate in pruning; they just don't ever get
added to the bitmap of what to skip. It's also fairly clear, I think,
how to handle independent pruning rules for different top-level tables
that are being unioned together: just OR the what-to-skip bitmaps.
But there may be some reason why this isn't workable.

regards, tom lane

#8David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#7)
1 attachment(s)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 11 June 2018 at 12:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:

On 10 June 2018 at 04:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So, IIUC, the issue is that for partitioning cases Append expects *all*
its children to be partitions of the *same* partitioned table? That
is, you could also break it with

select * from partitioned_table_a
union all
select * from partitioned_table_b

?

Not quite. I think what I sent above is the most simple way to break
it. Your case won't because there are no quals to prune with, so
run-time pruning is never attempted.

Well, I hadn't bothered to put in the extra code needed to have a qual
to prune with, but my point remains that it doesn't seem like the current
Append code is prepared to cope with an Append that contains partitions
of more than one top-level partitioned table.

Besides the partition pruning code, was there other aspects of Append
that you saw to be incompatible with mixed hierarchies?

I just had a thought that might lead to a nice solution to that, or
might be totally crazy. What if we inverted the sense of the bitmaps
that track partition pruning state, so that instead of a bitmap of
valid partitions that need to be scanned, we had a bitmap of pruned
partitions that we know we don't need to scan? (The indexes of this
bitmap would be subplan indexes not partition indexes.) With this
representation, it doesn't matter if some of the Append's children
are not supposed to participate in pruning; they just don't ever get
added to the bitmap of what to skip. It's also fairly clear, I think,
how to handle independent pruning rules for different top-level tables
that are being unioned together: just OR the what-to-skip bitmaps.
But there may be some reason why this isn't workable.

I think it would be less efficient. A common case and one that I very
much would like to make as fast as possible is when all but a single
partition is pruned. Doing the opposite sounds like more effort would
need to be expended to get the subplans that we do need to scan.

I don't really see the way it works now as a huge problem to overcome
in pruning. We'd just a list of subplans that don't belong to the
hierarchy and tag them on to the matches found in
ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans. The
bigger issue to overcome is the mixed flattened list of partition RT
indexes in partitioned_rels. Perhaps having a list of Lists for
partitioned_rels could be used to resolve that. The question is more,
how to solve for PG11. Do we need that?

I think we'll very soon be wanting to have ordered partition scans
where something like:

create table listp(a int) partition by list(a);
create index on listp(a);
create table listp1 partition of listp for values in (1);
create table listp2 partition of listp for values in (2);

and

select * from listp order by a;

would be possible with an Append and Index Scan, rather than having a
MergeAppend or Sort. In which case we'll not want mixed partition
hierarchies in the Append subplans. Although, perhaps that would mean
we just wouldn't pullup AppendPaths which have PathKeys.

I have written and attached the patch to stop flattening of
partitioned tables into UNION ALL parent's paths, meaning we can now
get nested Append and MergeAppend paths.

I've added Robert too as I know he was the committer of partitioning
and parallel Append. Maybe he has a view on what should be done about
this? Is not flattening the paths a problem?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

dont_flatten_append_paths_for_partitions.patchapplication/octet-stream; name=dont_flatten_append_paths_for_partitions.patchDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 477b9f7fb8..6ac8d9767b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -102,8 +102,9 @@ static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 static Path *get_cheapest_parameterized_child_path(PlannerInfo *root,
 									  RelOptInfo *rel,
 									  Relids required_outer);
-static void accumulate_append_subpath(Path *path,
-						  List **subpaths, List **special_subpaths);
+static void accumulate_append_subpath(RelOptInfo *parentrel,
+						  RelOptInfo *childrel, Path *path, List **subpaths,
+						  List **special_subpaths);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -1395,17 +1396,6 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	 * AppendPath generated for partitioned tables must record the RT indexes
 	 * of partitioned tables that are direct or indirect children of this
 	 * Append rel.
-	 *
-	 * AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
-	 * itself does not represent a partitioned relation, but the child sub-
-	 * queries may contain references to partitioned relations.  The loop
-	 * below will look for such children and collect them in a list to be
-	 * passed to the path creation function.  (This assumes that we don't need
-	 * to look through multiple levels of subquery RTEs; if we ever do, we
-	 * could consider stuffing the list we generate here into sub-query RTE's
-	 * RelOptInfo, just like we do for partitioned rels, which would be used
-	 * when populating our parent rel with paths.  For the present, that
-	 * appears to be unnecessary.)
 	 */
 	if (rel->part_scheme != NULL)
 	{
@@ -1435,10 +1425,16 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 
 		Assert(list_length(partitioned_rels) >= 1);
 	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-		build_partitioned_rels = true;
 
 	/*
+	 * For childrels that are themselves inheritance or UNION ALL parents,
+	 * we recursively pullup their Append and MergeAppend subpaths into rel's
+	 * path lists.  This effectively flattens the hierarchy and stops nested
+	 * Append/MergeAppend paths forming.  This is not done for UNION ALL
+	 * parents with partitioned tables in their subpaths.  These are left
+	 * unflattened as run-time partition pruning requires partitioned_rels to
+	 * only contain partitions which belong to a single hierarchy.
+	 *
 	 * For every non-dummy child, remember the cheapest path.  Also, identify
 	 * all pathkeys (orderings) and parameterizations (required_outer sets)
 	 * available for the non-dummy member relations.
@@ -1471,7 +1467,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		 */
 		if (childrel->pathlist != NIL &&
 			childrel->cheapest_total_path->param_info == NULL)
-			accumulate_append_subpath(childrel->cheapest_total_path,
+			accumulate_append_subpath(rel, childrel, childrel->cheapest_total_path,
 									  &subpaths, NULL);
 		else
 			subpaths_valid = false;
@@ -1480,7 +1476,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		if (childrel->partial_pathlist != NIL)
 		{
 			cheapest_partial_path = linitial(childrel->partial_pathlist);
-			accumulate_append_subpath(cheapest_partial_path,
+			accumulate_append_subpath(rel, childrel, cheapest_partial_path,
 									  &partial_subpaths, NULL);
 		}
 		else
@@ -1508,7 +1504,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 			{
 				/* Partial path is cheaper or the only option. */
 				Assert(cheapest_partial_path != NULL);
-				accumulate_append_subpath(cheapest_partial_path,
+				accumulate_append_subpath(rel, childrel,
+										  cheapest_partial_path,
 										  &pa_partial_subpaths,
 										  &pa_nonpartial_subpaths);
 
@@ -1528,7 +1525,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 				 * be given to different workers.  For now, we don't try to
 				 * figure that out.
 				 */
-				accumulate_append_subpath(nppath,
+				accumulate_append_subpath(rel, childrel, nppath,
 										  &pa_nonpartial_subpaths,
 										  NULL);
 			}
@@ -1754,7 +1751,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 				subpaths_valid = false;
 				break;
 			}
-			accumulate_append_subpath(subpath, &subpaths, NULL);
+			accumulate_append_subpath(rel, childrel, subpath, &subpaths,
+									  NULL);
 		}
 
 		if (subpaths_valid)
@@ -1845,9 +1843,9 @@ generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
 			if (cheapest_startup != cheapest_total)
 				startup_neq_total = true;
 
-			accumulate_append_subpath(cheapest_startup,
+			accumulate_append_subpath(rel, childrel, cheapest_startup,
 									  &startup_subpaths, NULL);
-			accumulate_append_subpath(cheapest_total,
+			accumulate_append_subpath(rel, childrel, cheapest_total,
 									  &total_subpaths, NULL);
 		}
 
@@ -1946,10 +1944,13 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
  * accumulate_append_subpath
  *		Add a subpath to the list being built for an Append or MergeAppend.
  *
- * It's possible that the child is itself an Append or MergeAppend path, in
- * which case we can "cut out the middleman" and just add its child paths to
- * our own list.  (We don't try to do this earlier because we need to apply
- * both levels of transformation to the quals.)
+ * For UNION ALL or inheritance parent childrels, we pullup the childrels
+ * Append and MergeAppend subpaths into 'subpaths' effectively bypassing the
+ * childrel's Append and MergeAppend paths.  We don't do this for partitioned
+ * childrels which are parented by UNION ALL parents as mixed partition
+ * hierarchies are not compatible with run-time partition pruning.
+ * (We perform the pullup operation here rather than earlier because we need
+ * to apply both levels of transformation to the quals.)
  *
  * Note that if we omit a child MergeAppend in this way, we are effectively
  * omitting a sort step, which seems fine: if the parent is to be an Append,
@@ -1965,8 +1966,15 @@ get_cheapest_parameterized_child_path(PlannerInfo *root, RelOptInfo *rel,
  * paths).
  */
 static void
-accumulate_append_subpath(Path *path, List **subpaths, List **special_subpaths)
+accumulate_append_subpath(RelOptInfo *parentrel, RelOptInfo *childrel,
+						  Path *path, List **subpaths,
+						  List **special_subpaths)
 {
+	if (parentrel->rtekind == RTE_SUBQUERY && childrel->part_scheme != NULL)
+	{
+		*subpaths = lappend(*subpaths, path);
+		return;
+	}
 	if (IsA(path, AppendPath))
 	{
 		AppendPath *apath = (AppendPath *) path;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index cf82b7052d..b848897693 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1065,6 +1065,12 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 		return plan;
 	}
 
+	/*
+	 * Ensure that partitioned_rels is set for partitioned tables, and never
+	 * set otherwise.
+	 */
+	Assert((best_path->partitioned_rels != NIL) == (rel->part_scheme != NULL));
+
 	/* Build the plan for each child */
 	foreach(subpaths, best_path->subpaths)
 	{
@@ -1077,8 +1083,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 		subplans = lappend(subplans, subplan);
 	}
 
-	if (enable_partition_pruning &&
-		rel->reloptkind == RELOPT_BASEREL &&
+	if (enable_partition_pruning && IS_SIMPLE_REL(rel) &&
 		best_path->partitioned_rels != NIL)
 	{
 		List	   *prunequal;
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index ab32c7d67e..6ad59a1afe 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2768,6 +2768,27 @@ select * from boolp where a = (select value from boolvalues where not value);
          Filter: (a = $0)
 (9 rows)
 
+-- Ensure runtime pruning works when partitioned tables are parented by
+-- UNION ALL parents
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from boolp union all select * from boolp) b where a = (select true);
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   InitPlan 1 (returns $0)
+     ->  Result (actual rows=1 loops=1)
+   ->  Append (actual rows=0 loops=1)
+         ->  Seq Scan on boolp_f (never executed)
+               Filter: (a = $0)
+         ->  Seq Scan on boolp_t (actual rows=0 loops=1)
+               Filter: (a = $0)
+   ->  Append (actual rows=0 loops=1)
+         ->  Seq Scan on boolp_f boolp_f_1 (never executed)
+               Filter: (a = $0)
+         ->  Seq Scan on boolp_t boolp_t_1 (actual rows=0 loops=1)
+               Filter: (a = $0)
+(13 rows)
+
 drop table boolp;
 reset enable_indexonlyscan;
 --
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 609fe09aeb..a7610ef550 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -701,6 +701,11 @@ select * from boolp where a = (select value from boolvalues where value);
 explain (analyze, costs off, summary off, timing off)
 select * from boolp where a = (select value from boolvalues where not value);
 
+-- Ensure runtime pruning works when partitioned tables are parented by
+-- UNION ALL parents
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from boolp union all select * from boolp) b where a = (select true);
+
 drop table boolp;
 
 reset enable_indexonlyscan;
#9Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#8)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2018/06/11 16:49, David Rowley wrote:

On 11 June 2018 at 12:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:

On 10 June 2018 at 04:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So, IIUC, the issue is that for partitioning cases Append expects *all*
its children to be partitions of the *same* partitioned table? That
is, you could also break it with

select * from partitioned_table_a
union all
select * from partitioned_table_b

?

Not quite.

That would be correct I think. An Append may contain multiple partitioned
tables that all appear under an UNION ALL parent, as in the OP's case and
the example above. In this case, the partitioned_rels list of Append
consist of non-leaf tables from *all* of the partitioned tables. Before
run-time pruning arrived, the only purpose of partitioned_rels list was to
make sure that the executor goes through it and locks all of those
non-leaf tables (ExecLockNonLeafAppendTables). Run-time pruning expanded
its usage by depending it to generate run-time pruning info.

I just had a thought that might lead to a nice solution to that, or
might be totally crazy. What if we inverted the sense of the bitmaps
that track partition pruning state, so that instead of a bitmap of
valid partitions that need to be scanned, we had a bitmap of pruned
partitions that we know we don't need to scan? (The indexes of this
bitmap would be subplan indexes not partition indexes.) With this
representation, it doesn't matter if some of the Append's children
are not supposed to participate in pruning; they just don't ever get
added to the bitmap of what to skip. It's also fairly clear, I think,
how to handle independent pruning rules for different top-level tables
that are being unioned together: just OR the what-to-skip bitmaps.
But there may be some reason why this isn't workable.

I think it would be less efficient. A common case and one that I very
much would like to make as fast as possible is when all but a single
partition is pruned. Doing the opposite sounds like more effort would
need to be expended to get the subplans that we do need to scan.

I don't really see the way it works now as a huge problem to overcome
in pruning. We'd just a list of subplans that don't belong to the
hierarchy and tag them on to the matches found in
ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans. The
bigger issue to overcome is the mixed flattened list of partition RT
indexes in partitioned_rels. Perhaps having a list of Lists for
partitioned_rels could be used to resolve that. The question is more,
how to solve for PG11. Do we need that?

I think we'll very soon be wanting to have ordered partition scans
where something like:

create table listp(a int) partition by list(a);
create index on listp(a);
create table listp1 partition of listp for values in (1);
create table listp2 partition of listp for values in (2);

and

select * from listp order by a;

would be possible with an Append and Index Scan, rather than having a
MergeAppend or Sort. In which case we'll not want mixed partition
hierarchies in the Append subplans. Although, perhaps that would mean
we just wouldn't pullup AppendPaths which have PathKeys.

I have written and attached the patch to stop flattening of
partitioned tables into UNION ALL parent's paths, meaning we can now
get nested Append and MergeAppend paths.

I've added Robert too as I know he was the committer of partitioning
and parallel Append. Maybe he has a view on what should be done about
this? Is not flattening the paths a problem?

Not speaking for Robert here, just saying from what I know.

I don't think your patch breaks anything, even if does change the shape of
the plan. So, for:

select * from partitioned_table_a
union all
select * from partitioned_table_b

The only thing that changes with the patch is that
ExecLockNonLeafAppendTables is called *twice* for the two nested Appends
corresponding to partitioned_table_a and partitioned_table_b, resp.,
instead of just once for the top level Append corresponding to the UNION
ALL parent. In fact, when called for the top level Append,
ExecLockNonLeafAppendTables is now a no-op.

Thanks,
Amit

#10David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#9)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 15 June 2018 at 20:37, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

select * from partitioned_table_a
union all
select * from partitioned_table_b

The only thing that changes with the patch is that
ExecLockNonLeafAppendTables is called *twice* for the two nested Appends
corresponding to partitioned_table_a and partitioned_table_b, resp.,
instead of just once for the top level Append corresponding to the UNION
ALL parent. In fact, when called for the top level Append,
ExecLockNonLeafAppendTables is now a no-op.

If the top level Append is the UNION ALL one, then there won't be any
partitioned_rels. If that's what you mean by no-op then, yeah. There
are no duplicate locks already obtained in the parent with the child
Append node.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#11Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#10)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2018/06/15 20:41, David Rowley wrote:

On 15 June 2018 at 20:37, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

select * from partitioned_table_a
union all
select * from partitioned_table_b

The only thing that changes with the patch is that
ExecLockNonLeafAppendTables is called *twice* for the two nested Appends
corresponding to partitioned_table_a and partitioned_table_b, resp.,
instead of just once for the top level Append corresponding to the UNION
ALL parent. In fact, when called for the top level Append,
ExecLockNonLeafAppendTables is now a no-op.

If the top level Append is the UNION ALL one, then there won't be any
partitioned_rels. If that's what you mean by no-op then, yeah. There
are no duplicate locks already obtained in the parent with the child
Append node.

Yeah, that's what I meant to say. I looked for whether the locks end up
being taken twice, once in the UNION ALL parent's ExecInitAppend and then
again in the individual child Appends' ExecInitAppend, but that they
don't, so the patch is right.

Thanks,
Amit

#12David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#11)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 18 June 2018 at 14:36, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2018/06/15 20:41, David Rowley wrote:

If the top level Append is the UNION ALL one, then there won't be any
partitioned_rels. If that's what you mean by no-op then, yeah. There
are no duplicate locks already obtained in the parent with the child
Append node.

Yeah, that's what I meant to say. I looked for whether the locks end up
being taken twice, once in the UNION ALL parent's ExecInitAppend and then
again in the individual child Appends' ExecInitAppend, but that they
don't, so the patch is right.

Thanks for looking.

Robert, do you have any objections to the proposed patch?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#13Robert Haas
robertmhaas@gmail.com
In reply to: David Rowley (#12)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On Sun, Jun 17, 2018 at 10:59 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

Thanks for looking.

Robert, do you have any objections to the proposed patch?

I don't have time to study this right now, but I think the main
possible objection is around performance. If not flattening the
Append is the best way to make queries run fast, then we should do it
that way. If making pruning capable of coping with mixed hierarchies
is going to be faster, then we should do that. If I were to speculate
in the absence of data, my guess would be that failing to flatten the
hierarchy is going to lead to a significant per-tuple cost, while the
cost of making run-time pruning smarter is likely to be incurred once
per rescan (i.e. a lot less). But that might be wrong, and it might
be impractical to get this working perfectly in v11 given the time we
have. But I would suggest that you performance test a query that ends
up feeding lots of tuples through two Append nodes rather than one and
see how much it hurts.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14David Rowley
david.rowley@2ndquadrant.com
In reply to: Robert Haas (#13)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 20 June 2018 at 02:28, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jun 17, 2018 at 10:59 PM, David Rowley

Robert, do you have any objections to the proposed patch?

I don't have time to study this right now, but I think the main
possible objection is around performance. If not flattening the
Append is the best way to make queries run fast, then we should do it
that way. If making pruning capable of coping with mixed hierarchies
is going to be faster, then we should do that. If I were to speculate
in the absence of data, my guess would be that failing to flatten the
hierarchy is going to lead to a significant per-tuple cost, while the
cost of making run-time pruning smarter is likely to be incurred once
per rescan (i.e. a lot less). But that might be wrong, and it might
be impractical to get this working perfectly in v11 given the time we
have. But I would suggest that you performance test a query that ends
up feeding lots of tuples through two Append nodes rather than one and
see how much it hurts.

I've performed two tests. One to see what the overhead of the
additional append is, and one to see what the saving from pruning away
unneeded partitions is. I tried to make the 2nd test use a realistic
number of partitions. Partition pruning will become more useful with
higher numbers of partitions.

Test 1: Test overhead of pulling tuples through an additional append

create table p (a int) partition by list (a);
create table p1 partition of p for values in(1);
insert into p select 1 from generate_series(1,1000000);
vacuum p1;
set max_parallel_workers_per_gather=0;

select count(*) from (select * from p union all select * from p) p;

Unpatched:
tps = 8.530355 (excluding connections establishing)

Patched:
tps = 7.853939 (excluding connections establishing)

Patched version takes 108.61% of the unpatched time.

Test 2: Tests time saved from run-time partition pruning and not
scanning the index on 23 of the partitions.

create table rp (d date) partition by range (d);
select 'CREATE TABLE rp' || x::text || ' PARTITION OF rp FOR VALUES
FROM (''' || '2017-01-01'::date + (x::text || ' month')::interval ||
''') TO (''' || '2017-01-01'::date + ((x+1)::text || '
month')::interval || ''');'
from generate_Series(0,23) x;
\gexec
insert into rp select d::date from
generate_series('2017-01-01','2018-12-31', interval '10 sec') d;
create index on rp (d);

select count(*) from (select * from rp union all select * from rp) rp
where d = current_date;

Unpatched: (set enable_partition_pruning = 0; to make it work)
tps = 260.969953 (excluding connections establishing)

Patched:
tps = 301.319038 (excluding connections establishing)

Patched version takes 86.61% of the unpatched time.

So, I don't think that really concludes much. I'd say the overhead
shown in test 1 is going to be a bit more predictable as it will
depend on how many tuples are being pulled through the additional
Append, but the savings shown in test 2 will vary. Having run-time
pruning not magically fail to work when the partitioned table is part
of a UNION ALL certainly seems less surprising.

If I drop the index from the "d" column in test 2, the performance gap
increases significantly and is roughly proportional to the number of
partitions.

Unpatched:
tps = 0.523691 (excluding connections establishing)

Patched:
tps = 13.453964 (excluding connections establishing)

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#15David Rowley
david.rowley@2ndquadrant.com
In reply to: David Rowley (#14)
1 attachment(s)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 20 June 2018 at 13:20, David Rowley <david.rowley@2ndquadrant.com> wrote:

select count(*) from (select * from p union all select * from p) p;

Unpatched:
tps = 8.530355 (excluding connections establishing)

Patched:
tps = 7.853939 (excluding connections establishing)

I've been thinking about this and I'm not so happy about my earlier
proposed fix about not flattening the Appends for UNION ALL parents
with partitioned relation children. Apart from the additional
overhead of pulling all tuples through an additional Append node, I
also don't like it because we one day might decide to fix it and make
it flatten these again. It would be much nicer not to fool around
with the plan shapes like that from one release to the next.

So, today I decided to write some code to just make the UNION ALL
parents work with partitioned tables while maintaining the Append
hierarchy flattening.

I've only just completed reading back through all the code and I think
it's correct. I ended up changing add_paths_to_append_rel() so that
instead of performing concatenation on partitioned_rels from two UNION
ALL children, it creates a List of lists. I believe this list can
only end up with a 2-level hierarchy of partitioned rels. I tested
this and it seems to work as I expect, although I think I need to
study the code a bit more to ensure it can't happen. I need to check
if there's some cases where nested UNION ALLs cannot be flattened to
have a single UNION ALL parent. Supporting this did cause me to have
to check the List->type in a few places. I only saw one other place in
the code where this is done, so I figured that meant it was okay.

In the executor side, I didn't add any pre-calculation code for each
partition hierarchy. I did this previously to save having to re-prune
on individual partitions, but I figured that was at a good enough
level not to have to add it for each partition hierarchy. I imagine
most partition hierarchies will just contain a single partitioned
table anyway, so an additional level of caching might not buy very
much, but I might just be trying to convince myself of that because
it's late here now.

Anyway... Patch attached. This is method 3 of the 3 methods I thought
to fix this, so if this is not suitable then I'm out of ideas.

It would be great to get some feedback on this as I'd really like to
be done with it before July.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

v1-0001-Fix-run-time-partition-pruning-for-UNION-ALL-pare.patchapplication/octet-stream; name=v1-0001-Fix-run-time-partition-pruning-for-UNION-ALL-pare.patchDownload
From 0fe7d6d606bbea6ec520bd960bdd9b6b90da50a1 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 26 Jun 2018 23:49:31 +1200
Subject: [PATCH v1] Fix run-time partition pruning for UNION ALL parents

The run-time partition pruning code added in 499be013d was unaware that the
partition_rels list that's built during add_paths_to_append_rel() could be
non-empty for relations other than just partitioned relations.  It can also
be set for UNION ALL parents where one or more union children are
partitioned tables.  This can cause the partitioned_rels list to end up
with the partition relids from multiple partition hierarchies to become
mixed.

This commit resolved that issue by never mixing the relids from different
UNION ALL children.  Instead we maintain a List of Lists containing the
partitioned relids.  This commit also adds all the new required code in
both the planner and executor to allow run-time pruning to work for UNION
ALL parents which query multiple partitioned tables.
---
 src/backend/executor/execPartition.c          | 389 +++++++++++++++-----------
 src/backend/executor/nodeAppend.c             |   4 +-
 src/backend/nodes/copyfuncs.c                 |  17 +-
 src/backend/nodes/outfuncs.c                  |  16 +-
 src/backend/nodes/readfuncs.c                 |  15 +-
 src/backend/optimizer/path/allpaths.c         |  14 +-
 src/backend/optimizer/plan/createplan.c       |  54 +++-
 src/backend/partitioning/partprune.c          | 293 ++++++++++++++-----
 src/include/executor/execPartition.h          |  25 +-
 src/include/nodes/nodes.h                     |   1 +
 src/include/nodes/plannodes.h                 |  39 ++-
 src/include/partitioning/partprune.h          |   3 +-
 src/test/regress/expected/partition_prune.out |  90 ++++++
 src/test/regress/sql/partition_prune.sql      |   8 +
 14 files changed, 706 insertions(+), 262 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7a4665cc4e..c8369fed81 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -48,8 +48,8 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 									 bool *isnull,
 									 int maxfieldlen);
 static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
-static void find_matching_subplans_recurse(PartitionPruneState *prunestate,
-							   PartitionPruningData *pprune,
+static void find_matching_subplans_recurse(PartitionPruningData *pprune,
+							   PartitionedRelPruningData *prelprune,
 							   bool initial_prune,
 							   Bitmapset **validsubplans);
 
@@ -1394,34 +1394,44 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  *
  * 'planstate' is the parent plan node's execution state.
  *
- * 'partitionpruneinfo' is a List of PartitionPruneInfos as generated by
+ * 'partitionpruneinfo' is a PartitionedRelPruneInfo as generated by
  * make_partition_pruneinfo.  Here we build a PartitionPruneState containing a
- * PartitionPruningData for each item in that List.  This data can be re-used
- * each time we re-evaluate which partitions match the pruning steps provided
- * in each PartitionPruneInfo.
+ * PartitionPruningData for each 'prune_infos' in 'partitionpruneinfo', in
+ * turn, a PartitionedRelPruningData is created for each
+ * PartitionedRelPruneInfo stored in the 'prune_infos'.  This two-level system
+ * is required in order to support run-time pruning with UNION ALL parents
+ * containing one or more partitioned tables as children.  The data stored in
+ * each PartitionedRelPruningData can be re-used each time we re-evaluate
+ * which partitions match the pruning steps provided in each
+ * PartitionedRelPruneInfo.
  */
 PartitionPruneState *
-ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
+ExecCreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *partitionpruneinfo)
 {
 	PartitionPruneState *prunestate;
 	PartitionPruningData *prunedata;
 	ListCell   *lc;
-	int			i;
+	int			n_part_hierarchies;
+	int			i,
+				j;
+
+	Assert(partitionpruneinfo != NULL);
 
-	Assert(partitionpruneinfo != NIL);
+	n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
 
 	/*
 	 * Allocate the data structure
 	 */
 	prunestate = (PartitionPruneState *) palloc(sizeof(PartitionPruneState));
 	prunedata = (PartitionPruningData *)
-		palloc(sizeof(PartitionPruningData) * list_length(partitionpruneinfo));
+		palloc(sizeof(PartitionPruningData) * n_part_hierarchies);
 
 	prunestate->partprunedata = prunedata;
-	prunestate->num_partprunedata = list_length(partitionpruneinfo);
+	prunestate->num_partprunedata = n_part_hierarchies;
 	prunestate->do_initial_prune = false;	/* may be set below */
 	prunestate->do_exec_prune = false;	/* may be set below */
 	prunestate->execparamids = NULL;
+	prunestate->other_subplans = bms_copy(partitionpruneinfo->other_subplans);
 
 	/*
 	 * Create a short-term memory context which we'll use when making calls to
@@ -1435,113 +1445,128 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 							  ALLOCSET_DEFAULT_SIZES);
 
 	i = 0;
-	foreach(lc, partitionpruneinfo)
+	foreach(lc, partitionpruneinfo->prune_infos)
 	{
-		PartitionPruneInfo *pinfo = castNode(PartitionPruneInfo, lfirst(lc));
-		PartitionPruningData *pprune = &prunedata[i];
-		PartitionPruneContext *context = &pprune->context;
-		PartitionDesc partdesc;
-		PartitionKey partkey;
-		int			partnatts;
-		int			n_steps;
 		ListCell   *lc2;
+		List	   *partrelpruneinfos = lfirst_node(List, lc);
+		PartitionedRelPruningData *partrelprunedata;
+		int			npartrelpruneinfos = list_length(partrelpruneinfos);
 
-		/*
-		 * We must copy the subplan_map rather than pointing directly to the
-		 * plan's version, as we may end up making modifications to it later.
-		 */
-		pprune->subplan_map = palloc(sizeof(int) * pinfo->nparts);
-		memcpy(pprune->subplan_map, pinfo->subplan_map,
-			   sizeof(int) * pinfo->nparts);
-
-		/* We can use the subpart_map verbatim, since we never modify it */
-		pprune->subpart_map = pinfo->subpart_map;
+		partrelprunedata = palloc(sizeof(PartitionedRelPruningData) *
+								  npartrelpruneinfos);
+		prunedata[i].partrelprunedata = partrelprunedata;
+		prunedata[i].num_partrelprunedata = npartrelpruneinfos;
 
-		/* present_parts is also subject to later modification */
-		pprune->present_parts = bms_copy(pinfo->present_parts);
-
-		/*
-		 * We need to hold a pin on the partitioned table's relcache entry so
-		 * that we can rely on its copies of the table's partition key and
-		 * partition descriptor.  We need not get a lock though; one should
-		 * have been acquired already by InitPlan or
-		 * ExecLockNonLeafAppendTables.
-		 */
-		context->partrel = relation_open(pinfo->reloid, NoLock);
-
-		partkey = RelationGetPartitionKey(context->partrel);
-		partdesc = RelationGetPartitionDesc(context->partrel);
-		n_steps = list_length(pinfo->pruning_steps);
-
-		context->strategy = partkey->strategy;
-		context->partnatts = partnatts = partkey->partnatts;
-		context->nparts = pinfo->nparts;
-		context->boundinfo = partdesc->boundinfo;
-		context->partcollation = partkey->partcollation;
-		context->partsupfunc = partkey->partsupfunc;
-
-		/* We'll look up type-specific support functions as needed */
-		context->stepcmpfuncs = (FmgrInfo *)
-			palloc0(sizeof(FmgrInfo) * n_steps * partnatts);
-
-		context->ppccontext = CurrentMemoryContext;
-		context->planstate = planstate;
-
-		/* Initialize expression state for each expression we need */
-		context->exprstates = (ExprState **)
-			palloc0(sizeof(ExprState *) * n_steps * partnatts);
-		foreach(lc2, pinfo->pruning_steps)
+		j = 0;
+		foreach(lc2, partrelpruneinfos)
 		{
-			PartitionPruneStepOp *step = (PartitionPruneStepOp *) lfirst(lc2);
+			PartitionedRelPruneInfo *pinfo = castNode(PartitionedRelPruneInfo, lfirst(lc2));
+			PartitionedRelPruningData *prelprune = &partrelprunedata[j];
+			PartitionPruneContext *context = &prelprune->context;
+			PartitionDesc partdesc;
+			PartitionKey partkey;
+			int			partnatts;
+			int			n_steps;
 			ListCell   *lc3;
-			int			keyno;
 
-			/* not needed for other step kinds */
-			if (!IsA(step, PartitionPruneStepOp))
-				continue;
+			/*
+			 * We must copy the subplan_map rather than pointing directly to
+			 * the plan's version, as we may end up making modifications to it
+			 * later.
+			 */
+			prelprune->subplan_map = palloc(sizeof(int) * pinfo->nparts);
+			memcpy(prelprune->subplan_map, pinfo->subplan_map,
+				   sizeof(int) * pinfo->nparts);
+
+			/* We can use the subpart_map verbatim, since we never modify it */
+			prelprune->subpart_map = pinfo->subpart_map;
 
-			Assert(list_length(step->exprs) <= partnatts);
+			/* present_parts is also subject to later modification */
+			prelprune->present_parts = bms_copy(pinfo->present_parts);
 
-			keyno = 0;
-			foreach(lc3, step->exprs)
+			/*
+			 * We need to hold a pin on the partitioned table's relcache entry
+			 * so that we can rely on its copies of the table's partition key
+			 * and partition descriptor.  We need not get a lock though; one
+			 * should have been acquired already by InitPlan or
+			 * ExecLockNonLeafAppendTables.
+			 */
+			context->partrel = relation_open(pinfo->reloid, NoLock);
+
+			partkey = RelationGetPartitionKey(context->partrel);
+			partdesc = RelationGetPartitionDesc(context->partrel);
+			n_steps = list_length(pinfo->pruning_steps);
+
+			context->strategy = partkey->strategy;
+			context->partnatts = partnatts = partkey->partnatts;
+			context->nparts = pinfo->nparts;
+			context->boundinfo = partdesc->boundinfo;
+			context->partcollation = partkey->partcollation;
+			context->partsupfunc = partkey->partsupfunc;
+
+			/* We'll look up type-specific support functions as needed */
+			context->stepcmpfuncs = (FmgrInfo *)
+				palloc0(sizeof(FmgrInfo) * n_steps * partnatts);
+
+			context->ppccontext = CurrentMemoryContext;
+			context->planstate = planstate;
+
+			/* Initialize expression state for each expression we need */
+			context->exprstates = (ExprState **)
+				palloc0(sizeof(ExprState *) * n_steps * partnatts);
+			foreach(lc3, pinfo->pruning_steps)
 			{
-				Expr	   *expr = (Expr *) lfirst(lc3);
+				PartitionPruneStepOp *step = (PartitionPruneStepOp *) lfirst(lc3);
+				ListCell   *lc4;
+				int			keyno;
+
+				/* not needed for other step kinds */
+				if (!IsA(step, PartitionPruneStepOp))
+					continue;
+
+				Assert(list_length(step->exprs) <= partnatts);
 
-				/* not needed for Consts */
-				if (!IsA(expr, Const))
+				keyno = 0;
+				foreach(lc4, step->exprs)
 				{
-					int			stateidx = PruneCxtStateIdx(partnatts,
-															step->step.step_id,
-															keyno);
+					Expr	   *expr = (Expr *) lfirst(lc4);
 
-					context->exprstates[stateidx] =
-						ExecInitExpr(expr, context->planstate);
+					/* not needed for Consts */
+					if (!IsA(expr, Const))
+					{
+						int			stateidx = PruneCxtStateIdx(partnatts,
+																step->step.step_id,
+																keyno);
+
+						context->exprstates[stateidx] =
+							ExecInitExpr(expr, context->planstate);
+					}
+					keyno++;
 				}
-				keyno++;
 			}
-		}
 
-		/* Array is not modified at runtime, so just point to plan's copy */
-		context->exprhasexecparam = pinfo->hasexecparam;
+			/* Array is not modified at runtime, so just point to plan's copy */
+			context->exprhasexecparam = pinfo->hasexecparam;
 
-		pprune->pruning_steps = pinfo->pruning_steps;
-		pprune->do_initial_prune = pinfo->do_initial_prune;
-		pprune->do_exec_prune = pinfo->do_exec_prune;
+			prelprune->pruning_steps = pinfo->pruning_steps;
+			prelprune->do_initial_prune = pinfo->do_initial_prune;
+			prelprune->do_exec_prune = pinfo->do_exec_prune;
 
-		/* Record if pruning would be useful at any level */
-		prunestate->do_initial_prune |= pinfo->do_initial_prune;
-		prunestate->do_exec_prune |= pinfo->do_exec_prune;
+			/* Record if pruning would be useful at any level */
+			prunestate->do_initial_prune |= pinfo->do_initial_prune;
+			prunestate->do_exec_prune |= pinfo->do_exec_prune;
 
-		/*
-		 * Accumulate the IDs of all PARAM_EXEC Params affecting the
-		 * partitioning decisions at this plan node.
-		 */
-		prunestate->execparamids = bms_add_members(prunestate->execparamids,
-												   pinfo->execparamids);
+			/*
+			 * Accumulate the IDs of all PARAM_EXEC Params affecting the
+			 * partitioning decisions at this plan node.
+			 */
+			prunestate->execparamids = bms_add_members(prunestate->execparamids,
+													   pinfo->execparamids);
 
+			j++;
+		}
 		i++;
 	}
-
 	return prunestate;
 }
 
@@ -1555,13 +1580,21 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 void
 ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
 {
-	int			i;
+	int			i,
+				j;
 
 	for (i = 0; i < prunestate->num_partprunedata; i++)
 	{
 		PartitionPruningData *pprune = &prunestate->partprunedata[i];
 
-		relation_close(pprune->context.partrel, NoLock);
+		for (j = 0; j < pprune->num_partrelprunedata; j++)
+		{
+			PartitionedRelPruningData *prelprune;
+
+			prelprune = &pprune->partrelprunedata[j];
+
+			relation_close(prelprune->context.partrel, NoLock);
+		}
 	}
 }
 
@@ -1581,31 +1614,42 @@ ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
 Bitmapset *
 ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 {
-	PartitionPruningData *pprune;
 	MemoryContext oldcontext;
 	Bitmapset  *result = NULL;
+	int			i;
 
 	Assert(prunestate->do_initial_prune);
 
-	pprune = prunestate->partprunedata;
-
 	/*
 	 * Switch to a temp context to avoid leaking memory in the executor's
 	 * memory context.
 	 */
 	oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
 
-	/* Perform pruning without using PARAM_EXEC Params */
-	find_matching_subplans_recurse(prunestate, pprune, true, &result);
+	for (i = 0; i < prunestate->num_partprunedata; i++)
+	{
+		PartitionPruningData *pprune;
+		PartitionedRelPruningData *prelprune;
+
+		pprune = &prunestate->partprunedata[i];
+		prelprune = &pprune->partrelprunedata[0];
+
+		/* Perform pruning without using PARAM_EXEC Params */
+		find_matching_subplans_recurse(pprune, prelprune, true, &result);
+
+		/* Expression eval may have used space in node's ps_ExprContext too */
+		ResetExprContext(prelprune->context.planstate->ps_ExprContext);
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
 	/* Copy result out of the temp context before we reset it */
 	result = bms_copy(result);
 
+	/* Add in any subplans which partition pruning didn't account for */
+	result = bms_add_members(result, prunestate->other_subplans);
+
 	MemoryContextReset(prunestate->prune_context);
-	/* Expression eval may have used space in node's ps_ExprContext too */
-	ResetExprContext(pprune->context.planstate->ps_ExprContext);
 
 	/*
 	 * If any subplans were pruned, we must re-sequence the subplan indexes so
@@ -1633,59 +1677,70 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 		}
 
 		/*
-		 * Now we can update each PartitionPruneInfo's subplan_map with new
-		 * subplan indexes.  We must also recompute its present_parts bitmap.
-		 * We perform this loop in back-to-front order so that we determine
-		 * present_parts for the lowest-level partitioned tables first.  This
-		 * way we can tell whether a sub-partitioned table's partitions were
-		 * entirely pruned so we can exclude that from 'present_parts'.
+		 * Now we can update each PartitionedRelPruneInfo's subplan_map with
+		 * new subplan indexes.  We must also recompute its present_parts
+		 * bitmap. We perform this loop in back-to-front order so that we
+		 * determine present_parts for the lowest-level partitioned tables
+		 * first.  This way we can tell whether a sub-partitioned table's
+		 * partitions were entirely pruned so we can exclude that from
+		 * 'present_parts'.
 		 */
-		for (i = prunestate->num_partprunedata - 1; i >= 0; i--)
+
+		for (i = 0; i < prunestate->num_partprunedata; i++)
 		{
-			int			nparts;
 			int			j;
+			PartitionPruningData *prelpruneinfo;
 
-			pprune = &prunestate->partprunedata[i];
-			nparts = pprune->context.nparts;
-			/* We just rebuild present_parts from scratch */
-			bms_free(pprune->present_parts);
-			pprune->present_parts = NULL;
+			prelpruneinfo = &prunestate->partprunedata[i];
 
-			for (j = 0; j < nparts; j++)
+			for (j = prelpruneinfo->num_partrelprunedata - 1; j >= 0; j--)
 			{
-				int			oldidx = pprune->subplan_map[j];
-				int			subidx;
+				PartitionedRelPruningData *pprune;
+				int			nparts;
+				int			k;
 
-				/*
-				 * If this partition existed as a subplan then change the old
-				 * subplan index to the new subplan index.  The new index may
-				 * become -1 if the partition was pruned above, or it may just
-				 * come earlier in the subplan list due to some subplans being
-				 * removed earlier in the list.  If it's a subpartition, add
-				 * it to present_parts unless it's entirely pruned.
-				 */
-				if (oldidx >= 0)
-				{
-					Assert(oldidx < nsubplans);
-					pprune->subplan_map[j] = new_subplan_indexes[oldidx];
+				pprune = &prelpruneinfo->partrelprunedata[j];
+				nparts = pprune->context.nparts;
+				/* We just rebuild present_parts from scratch */
+				bms_free(pprune->present_parts);
+				pprune->present_parts = NULL;
 
-					if (new_subplan_indexes[oldidx] >= 0)
-						pprune->present_parts =
-							bms_add_member(pprune->present_parts, j);
-				}
-				else if ((subidx = pprune->subpart_map[j]) >= 0)
+				for (k = 0; k < nparts; k++)
 				{
-					PartitionPruningData *subprune;
+					int			oldidx = pprune->subplan_map[k];
+					int			subidx;
 
-					subprune = &prunestate->partprunedata[subidx];
+					/*
+					 * If this partition existed as a subplan then change the
+					 * old subplan index to the new subplan index.  The new
+					 * index may become -1 if the partition was pruned above,
+					 * or it may just come earlier in the subplan list due to
+					 * some subplans being removed earlier in the list.  If
+					 * it's a subpartition, add it to present_parts unless
+					 * it's entirely pruned.
+					 */
+					if (oldidx >= 0)
+					{
+						Assert(oldidx < nsubplans);
+						pprune->subplan_map[k] = new_subplan_indexes[oldidx];
 
-					if (!bms_is_empty(subprune->present_parts))
-						pprune->present_parts =
-							bms_add_member(pprune->present_parts, j);
+						if (new_subplan_indexes[oldidx] >= 0)
+							pprune->present_parts =
+								bms_add_member(pprune->present_parts, k);
+					}
+					else if ((subidx = pprune->subpart_map[k]) >= 0)
+					{
+						PartitionedRelPruningData *subprune;
+
+						subprune = &prelpruneinfo->partrelprunedata[subidx];
+
+						if (!bms_is_empty(subprune->present_parts))
+							pprune->present_parts =
+								bms_add_member(pprune->present_parts, k);
+					}
 				}
 			}
 		}
-
 		pfree(new_subplan_indexes);
 	}
 
@@ -1702,11 +1757,9 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 Bitmapset *
 ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
 {
-	PartitionPruningData *pprune;
 	MemoryContext oldcontext;
 	Bitmapset  *result = NULL;
-
-	pprune = prunestate->partprunedata;
+	int			i;
 
 	/*
 	 * Switch to a temp context to avoid leaking memory in the executor's
@@ -1714,16 +1767,29 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
 	 */
 	oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
 
-	find_matching_subplans_recurse(prunestate, pprune, false, &result);
+	for (i = 0; i < prunestate->num_partprunedata; i++)
+	{
+		PartitionPruningData *pprune;
+		PartitionedRelPruningData *prelprune;
+
+		pprune = &prunestate->partprunedata[i];
+		prelprune = &pprune->partrelprunedata[0];
+
+		find_matching_subplans_recurse(pprune, prelprune, false, &result);
+
+		/* Expression eval may have used space in node's ps_ExprContext too */
+		ResetExprContext(prelprune->context.planstate->ps_ExprContext);
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
 	/* Copy result out of the temp context before we reset it */
 	result = bms_copy(result);
 
+	/* Add in any subplans which partition pruning didn't account for */
+	result = bms_add_members(result, prunestate->other_subplans);
+
 	MemoryContextReset(prunestate->prune_context);
-	/* Expression eval may have used space in node's ps_ExprContext too */
-	ResetExprContext(pprune->context.planstate->ps_ExprContext);
 
 	return result;
 }
@@ -1736,8 +1802,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
  * Adds valid (non-prunable) subplan IDs to *validsubplans
  */
 static void
-find_matching_subplans_recurse(PartitionPruneState *prunestate,
-							   PartitionPruningData *pprune,
+find_matching_subplans_recurse(PartitionPruningData *pprune,
+							   PartitionedRelPruningData *prelprune,
 							   bool initial_prune,
 							   Bitmapset **validsubplans)
 {
@@ -1748,15 +1814,16 @@ find_matching_subplans_recurse(PartitionPruneState *prunestate,
 	check_stack_depth();
 
 	/* Only prune if pruning would be useful at this level. */
-	if (initial_prune ? pprune->do_initial_prune : pprune->do_exec_prune)
+	if (initial_prune ? prelprune->do_initial_prune :
+						prelprune->do_exec_prune)
 	{
-		PartitionPruneContext *context = &pprune->context;
+		PartitionPruneContext *context = &prelprune->context;
 
 		/* Set whether we can evaluate PARAM_EXEC Params or not */
 		context->evalexecparams = !initial_prune;
 
 		partset = get_matching_partitions(context,
-										  pprune->pruning_steps);
+										  prelprune->pruning_steps);
 	}
 	else
 	{
@@ -1764,23 +1831,23 @@ find_matching_subplans_recurse(PartitionPruneState *prunestate,
 		 * If no pruning is to be done, just include all partitions at this
 		 * level.
 		 */
-		partset = pprune->present_parts;
+		partset = prelprune->present_parts;
 	}
 
 	/* Translate partset into subplan indexes */
 	i = -1;
 	while ((i = bms_next_member(partset, i)) >= 0)
 	{
-		if (pprune->subplan_map[i] >= 0)
+		if (prelprune->subplan_map[i] >= 0)
 			*validsubplans = bms_add_member(*validsubplans,
-											pprune->subplan_map[i]);
+											prelprune->subplan_map[i]);
 		else
 		{
-			int			partidx = pprune->subpart_map[i];
+			int			partidx = prelprune->subpart_map[i];
 
 			if (partidx >= 0)
-				find_matching_subplans_recurse(prunestate,
-											   &prunestate->partprunedata[partidx],
+				find_matching_subplans_recurse(pprune,
+											   &pprune->partrelprunedata[partidx],
 											   initial_prune, validsubplans);
 			else
 			{
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5ce4fb43e1..97451ed820 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -129,7 +129,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->as_whichplan = INVALID_SUBPLAN_INDEX;
 
 	/* If run-time partition pruning is enabled, then set that up now */
-	if (node->part_prune_infos != NIL)
+	if (node->part_prune_info)
 	{
 		PartitionPruneState *prunestate;
 
@@ -138,7 +138,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 
 		/* Create the working data structure for pruning. */
 		prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
-												   node->part_prune_infos);
+												   node->part_prune_info);
 		appendstate->as_prune_state = prunestate;
 
 		/* Perform an initial partition prune, if required. */
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1c12075b01..9c71cde14a 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -245,7 +245,7 @@ _copyAppend(const Append *from)
 	COPY_NODE_FIELD(appendplans);
 	COPY_SCALAR_FIELD(first_partial_plan);
 	COPY_NODE_FIELD(partitioned_rels);
-	COPY_NODE_FIELD(part_prune_infos);
+	COPY_NODE_FIELD(part_prune_info);
 
 	return newnode;
 }
@@ -1181,6 +1181,18 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
 {
 	PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
 
+	COPY_NODE_FIELD(prune_infos);
+	COPY_BITMAPSET_FIELD(other_subplans);
+
+	return newnode;
+}
+
+
+static PartitionedRelPruneInfo *
+_copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
+{
+	PartitionedRelPruneInfo *newnode = makeNode(PartitionedRelPruneInfo);
+
 	COPY_SCALAR_FIELD(reloid);
 	COPY_NODE_FIELD(pruning_steps);
 	COPY_BITMAPSET_FIELD(present_parts);
@@ -4907,6 +4919,9 @@ copyObjectImpl(const void *from)
 		case T_PartitionPruneInfo:
 			retval = _copyPartitionPruneInfo(from);
 			break;
+		case T_PartitionedRelPruneInfo:
+			retval = _copyPartitionedRelPruneInfo(from);
+			break;
 		case T_PartitionPruneStepOp:
 			retval = _copyPartitionPruneStepOp(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 979d523e00..ef599342a8 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -402,7 +402,7 @@ _outAppend(StringInfo str, const Append *node)
 	WRITE_NODE_FIELD(appendplans);
 	WRITE_INT_FIELD(first_partial_plan);
 	WRITE_NODE_FIELD(partitioned_rels);
-	WRITE_NODE_FIELD(part_prune_infos);
+	WRITE_NODE_FIELD(part_prune_info);
 }
 
 static void
@@ -1012,10 +1012,19 @@ _outPlanRowMark(StringInfo str, const PlanRowMark *node)
 
 static void
 _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
+{
+	WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
+
+	WRITE_NODE_FIELD(prune_infos);
+	WRITE_BITMAPSET_FIELD(other_subplans);
+}
+
+static void
+_outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
 {
 	int			i;
 
-	WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
+	WRITE_NODE_TYPE("PARTITIONEDRELPRUNEINFO");
 
 	WRITE_OID_FIELD(reloid);
 	WRITE_NODE_FIELD(pruning_steps);
@@ -3830,6 +3839,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionPruneInfo:
 				_outPartitionPruneInfo(str, obj);
 				break;
+			case T_PartitionedRelPruneInfo:
+				_outPartitionedRelPruneInfo(str, obj);
+				break;
 			case T_PartitionPruneStepOp:
 				_outPartitionPruneStepOp(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42aff7f57a..ea4e8df62e 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1612,7 +1612,7 @@ _readAppend(void)
 	READ_NODE_FIELD(appendplans);
 	READ_INT_FIELD(first_partial_plan);
 	READ_NODE_FIELD(partitioned_rels);
-	READ_NODE_FIELD(part_prune_infos);
+	READ_NODE_FIELD(part_prune_info);
 
 	READ_DONE();
 }
@@ -2328,6 +2328,17 @@ _readPartitionPruneInfo(void)
 {
 	READ_LOCALS(PartitionPruneInfo);
 
+	READ_NODE_FIELD(prune_infos);
+	READ_BITMAPSET_FIELD(other_subplans);
+
+	READ_DONE();
+}
+
+static PartitionedRelPruneInfo *
+_readPartitionedRelPruneInfo(void)
+{
+	READ_LOCALS(PartitionedRelPruneInfo);
+
 	READ_OID_FIELD(reloid);
 	READ_NODE_FIELD(pruning_steps);
 	READ_BITMAPSET_FIELD(present_parts);
@@ -2725,6 +2736,8 @@ parseNodeString(void)
 		return_value = _readPlanRowMark();
 	else if (MATCH("PARTITIONPRUNEINFO", 18))
 		return_value = _readPartitionPruneInfo();
+	else if (MATCH("PARTITIONEDRELPRUNEINFO", 23))
+		return_value = _readPartitionedRelPruneInfo();
 	else if (MATCH("PARTITIONPRUNESTEPOP", 20))
 		return_value = _readPartitionPruneStepOp();
 	else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3ada379f8b..2adbebcd35 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1455,14 +1455,22 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		/*
 		 * If we need to build partitioned_rels, accumulate the partitioned
 		 * rels for this child.  We must ensure that parents are always listed
-		 * before their child partitioned tables.
+		 * before their child partitioned tables.  For UNION ALL parents, so
+		 * not to mix different partition hierarchies, we store a List of
+		 * lists containing the child relids.
 		 */
 		if (build_partitioned_rels)
 		{
 			List	   *cprels = childrel->partitioned_child_rels;
 
-			partitioned_rels = list_concat(partitioned_rels,
-										   list_copy(cprels));
+			if (rel->rtekind == RTE_SUBQUERY)
+			{
+				if (cprels != NIL)
+					partitioned_rels = lappend(partitioned_rels, cprels);
+			}
+			else
+				partitioned_rels = list_concat(partitioned_rels,
+											   list_copy(cprels));
 		}
 
 		/*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index cf82b7052d..3a69baf9df 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -133,6 +133,7 @@ static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
 static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 					  List **qual, List **indexqual, List **indexECs);
 static void bitmap_subplan_mark_shared(Plan *plan);
+static List *flatten_partitioned_rels(List *partitioned_rels);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 					List *tlist, List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
@@ -211,7 +212,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
 static Append *make_append(List *appendplans, int first_partial_plan,
-			List *tlist, List *partitioned_rels, List *partpruneinfos);
+			List *tlist, List *partitioned_rels,
+			PartitionPruneInfo *partpruneinfo);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1039,7 +1041,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	List	   *subplans = NIL;
 	ListCell   *subpaths;
 	RelOptInfo *rel = best_path->path.parent;
-	List	   *partpruneinfos = NIL;
+	PartitionPruneInfo *partpruneinfo = NULL;
 
 	/*
 	 * The subpaths list could be empty, if every child was proven empty by
@@ -1099,13 +1101,12 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 
 		/*
 		 * If any quals exist, they may be useful to perform further partition
-		 * pruning during execution.  Generate a PartitionPruneInfo for each
-		 * partitioned rel to store these quals and allow translation of
-		 * partition indexes into subpath indexes.
+		 * pruning during execution.  Attempt to generate a PartitionPruneInfo
+		 * object to allow further pruning to be done during execution.
 		 */
 		if (prunequal != NIL)
-			partpruneinfos =
-				make_partition_pruneinfo(root,
+			partpruneinfo =
+				make_partition_pruneinfo(root, rel,
 										 best_path->partitioned_rels,
 										 best_path->subpaths, prunequal);
 	}
@@ -1119,7 +1120,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 
 	plan = make_append(subplans, best_path->first_partial_path,
 					   tlist, best_path->partitioned_rels,
-					   partpruneinfos);
+					   partpruneinfo);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -5074,6 +5075,35 @@ bitmap_subplan_mark_shared(Plan *plan)
 		elog(ERROR, "unrecognized node type: %d", nodeTag(plan));
 }
 
+/*
+ * flatten_partitioned_rels
+ *		Flatten upto a 2-deep List hierarchy of relids.
+ */
+static List *
+flatten_partitioned_rels(List *partitioned_rels)
+{
+	if (partitioned_rels == NIL)
+		return NIL;
+	else if (partitioned_rels->type == T_IntList)
+		return partitioned_rels;
+	else
+	{
+		ListCell   *lc;
+		List	   *newlist = NIL;
+
+		foreach(lc, partitioned_rels)
+		{
+			List	   *sublist = lfirst(lc);
+
+			Assert(sublist->type == T_IntList);
+
+			newlist = list_concat(newlist, list_copy(sublist));
+		}
+
+		return newlist;
+	}
+}
+
 /*****************************************************************************
  *
  *	PLAN NODE BUILDING ROUTINES
@@ -5417,7 +5447,7 @@ make_foreignscan(List *qptlist,
 static Append *
 make_append(List *appendplans, int first_partial_plan,
 			List *tlist, List *partitioned_rels,
-			List *partpruneinfos)
+			PartitionPruneInfo *partpruneinfo)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5428,8 +5458,8 @@ make_append(List *appendplans, int first_partial_plan,
 	plan->righttree = NULL;
 	node->appendplans = appendplans;
 	node->first_partial_plan = first_partial_plan;
-	node->partitioned_rels = partitioned_rels;
-	node->part_prune_infos = partpruneinfos;
+	node->partitioned_rels = flatten_partitioned_rels(partitioned_rels);
+	node->part_prune_info = partpruneinfo;
 	return node;
 }
 
@@ -6586,7 +6616,7 @@ make_modifytable(PlannerInfo *root,
 	node->operation = operation;
 	node->canSetTag = canSetTag;
 	node->nominalRelation = nominalRelation;
-	node->partitioned_rels = partitioned_rels;
+	node->partitioned_rels = flatten_partitioned_rels(partitioned_rels);
 	node->partColsUpdated = partColsUpdated;
 	node->resultRelations = resultRelations;
 	node->resultRelIndex = -1;	/* will be set correctly in setrefs.c */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index cdc61a8997..218cac71ef 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -111,7 +111,11 @@ typedef struct PruneStepResult
 	bool		scan_null;		/* Scan the partition for NULL values? */
 } PruneStepResult;
 
-
+static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
+							  RelOptInfo *parentrel,
+							  int *relid_subplan_map,
+							  List *partitioned_rels, List *prunequal,
+							  Bitmapset **matchedsubplans);
 static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
 					bool *contradictory);
 static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context,
@@ -160,8 +164,8 @@ static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context
 						  FmgrInfo *partsupfunc, Bitmapset *nullkeys);
 static Bitmapset *pull_exec_paramids(Expr *expr);
 static bool pull_exec_paramids_walker(Node *node, Bitmapset **context);
-static bool analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps,
-					  int partnatts);
+static bool analyze_partkey_exprs(PartitionedRelPruneInfo *prelinfo,
+					  List *steps, int partnatts);
 static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
 						  PartitionPruneStepOp *opstep);
 static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
@@ -176,38 +180,36 @@ static bool partkey_datum_from_expr(PartitionPruneContext *context,
 
 /*
  * make_partition_pruneinfo
- *		Build List of PartitionPruneInfos, one for each partitioned rel.
- *		These can be used in the executor to allow additional partition
- *		pruning to take place.
+ *		Builds a PartitionPruneInfo which can be used in the executor to allow
+ *		additional partition pruning to take place.  Returns NULL when
+ *		partition pruning would be useless.
  *
- * Here we generate partition pruning steps for 'prunequal' and also build a
- * data structure which allows mapping of partition indexes into 'subpaths'
- * indexes.
+ * Here we build a PartitionedRelPrune info for each partitioned relation in
+ * 'partitioned_rels'.  This list can either contain a list of relids of each
+ * partitioned relation, or contain a list of Lists which contain the relids.
  *
- * If no non-Const expressions are being compared to the partition key in any
- * of the 'partitioned_rels', then we return NIL to indicate no run-time
- * pruning should be performed.  Run-time pruning would be useless, since the
- * pruning done during planning will have pruned everything that can be.
+ * Any subpaths which could not be matched to a partitioned rel are set in
+ * the returned PartitionPruneInfo's 'other_subplans'.  Callers will likely
+ * want to ensure that subplans listed here are not pruned.
  */
-List *
-make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
-						 List *subpaths, List *prunequal)
+PartitionPruneInfo *
+make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
+						 List *partitioned_rels, List *subpaths,
+						 List *prunequal)
 {
-	RelOptInfo *targetpart = NULL;
-	List	   *pinfolist = NIL;
-	bool		doruntimeprune = false;
+	PartitionPruneInfo *pruneinfo;
+	Bitmapset  *allmatchedsubplans = NULL;
 	int		   *relid_subplan_map;
-	int		   *relid_subpart_map;
 	ListCell   *lc;
+	List	   *prunerelinfos;
 	int			i;
 
 	/*
-	 * Construct two temporary arrays to map from planner relids to subplan
-	 * and sub-partition indexes.  For convenience, we use 1-based indexes
-	 * here, so that zero can represent an un-filled array entry.
+	 * Construct a temporary array to map from planner relids to subplan
+	 * indexes.  For convenience, we use 1-based indexes here, so that zero
+	 * can represent an un-filled array entry.
 	 */
 	relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
-	relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size);
 
 	/*
 	 * relid_subplan_map maps relid of a leaf partition to the index in
@@ -227,10 +229,132 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 		relid_subplan_map[pathrel->relid] = i++;
 	}
 
+	/*
+	 * We now build a PartitionedRelPruneInfo for each partitioned rel.  For
+	 * UNION ALL parents, the partitioned_rels will be a List of List, but
+	 * when the parent is a partitioned table, this will just be a list of
+	 * ints.
+	 */
+	if (partitioned_rels->type == T_IntList)
+	{
+		Bitmapset  *matchedsubplans = NULL;
+		List	   *prelinfolist;
+
+		prelinfolist = make_partitionedrel_pruneinfo(root, parentrel,
+													 relid_subplan_map,
+													 partitioned_rels, prunequal,
+													 &matchedsubplans);
+
+		/* Only record matchedsubplans if pruning will be performed */
+		if (prelinfolist != NIL)
+		{
+			prunerelinfos = list_make1(prelinfolist);
+			allmatchedsubplans = matchedsubplans;
+		}
+		else
+			prunerelinfos = NIL;
+	}
+	else
+	{
+		Assert(partitioned_rels->type == T_List);
+
+		prunerelinfos = NIL;
+
+		foreach(lc, partitioned_rels)
+		{
+			List	   *rels = lfirst(lc);
+			List	   *prelinfolist;
+			Bitmapset  *matchedsubplans = NULL;
+
+			prelinfolist = make_partitionedrel_pruneinfo(root, parentrel,
+														 relid_subplan_map,
+														 rels, prunequal,
+
+														 &matchedsubplans);
+
+			/* Only record matchedsubplans if pruning will be performed */
+			if (prelinfolist != NIL)
+			{
+				prunerelinfos = lappend(prunerelinfos, prelinfolist);
+				allmatchedsubplans = bms_join(matchedsubplans,
+											  allmatchedsubplans);
+			}
+		}
+	}
+
+	pfree(relid_subplan_map);
+
+	/* No run-time pruning required. */
+	if (prunerelinfos == NIL)
+		return NULL;
+
+	pruneinfo = makeNode(PartitionPruneInfo);
+	pruneinfo->prune_infos = prunerelinfos;
+
+	/*
+	 * Some subplans may not belong to of the listed partitioned_rels.  This
+	 * can happen for UNION ALL queries which include a non-partitioned table.
+	 * We record all of the subplans which we didn't build any
+	 * PartitionedRelPruneInfo for so that callers can easily identify which
+	 * subplans should not be pruned.
+	 */
+	if (bms_num_members(allmatchedsubplans) < list_length(subpaths))
+	{
+		Bitmapset  *other_subplans;
+
+		/* Create an inverted set of allmatchedsubplans */
+		other_subplans = bms_add_range(NULL, 0, list_length(subpaths) - 1);
+		other_subplans = bms_del_members(other_subplans, allmatchedsubplans);
+
+		pruneinfo->other_subplans = other_subplans;
+	}
+	else
+		pruneinfo->other_subplans = NULL;
+
+	return pruneinfo;
+}
+
+/*
+ * make_partitionedrel_pruneinfo
+ *		Build List of PartitionedRelPruneInfos, one for each partitioned rel.
+ *		These can be used in the executor to allow additional partition
+ *		pruning to take place.
+ *
+ * Here we generate partition pruning steps for 'prunequal' and also build a
+ * data structure which allows mapping of partition indexes into 'subpaths'
+ * indexes.
+ *
+ * If no non-Const expressions are being compared to the partition key in any
+ * of the 'partitioned_rels', then we return NIL to indicate no run-time
+ * pruning should be performed.  Run-time pruning would be useless, since the
+ * pruning done during planning will have pruned everything that can be.
+ */
+static List *
+make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
+							  int *relid_subplan_map,
+							  List *partitioned_rels, List *prunequal,
+							  Bitmapset **matchedsubplans)
+{
+	RelOptInfo *targetpart = NULL;
+	List	   *prelinfolist = NIL;
+	bool		doruntimeprune = false;
+	bool		hascontradictingquals = false;
+	ListCell   *lc;
+	int		   *relid_subpart_map;
+	Bitmapset  *subplansfound = NULL;
+	int			i;
+
+	/*
+	 * Construct a temporary array to map from planner relids to index of the
+	 * partitioned_rel.  For convenience, we use 1-based indexes here, so that
+	 * zero can represent an un-filled array entry.
+	 */
+	relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size);
+
 	/*
 	 * relid_subpart_map maps relid of a non-leaf partition to the index in
 	 * 'partitioned_rels' of that rel (which will also be the index in the
-	 * returned PartitionPruneInfo list of the info for that partition).
+	 * returned PartitionedRelPruneInfo list of the info for that partition).
 	 */
 	i = 1;
 	foreach(lc, partitioned_rels)
@@ -246,12 +370,12 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 		relid_subpart_map[rti] = i++;
 	}
 
-	/* We now build a PartitionPruneInfo for each partitioned rel */
+	/* We now build a PartitionedRelPruneInfo for each partitioned rel */
 	foreach(lc, partitioned_rels)
 	{
 		Index		rti = lfirst_int(lc);
 		RelOptInfo *subpart = find_base_rel(root, rti);
-		PartitionPruneInfo *pinfo;
+		PartitionedRelPruneInfo *prelinfo;
 		RangeTblEntry *rte;
 		Bitmapset  *present_parts;
 		int			nparts = subpart->nparts;
@@ -269,6 +393,31 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 		if (!targetpart)
 		{
 			targetpart = subpart;
+
+			/*
+			 * When the first listed partitioned table is not the same rel as
+			 * 'parentrel', then we must be dealing with a UNION ALL
+			 * parentrel.  We'd better translate the pruning qual so that it's
+			 * compatible with the top-level partitioned table.  We overwrite
+			 * the input parameter here so that subsequent translations for
+			 * sub-partitioned tables translate from the top-level partitioned
+			 * table, rather than the UNION ALL parent.
+			 */
+			if (parentrel != subpart)
+			{
+				int			nappinfos;
+				AppendRelInfo **appinfos = find_appinfos_by_relids(root,
+																   subpart->relids,
+																   &nappinfos);
+
+				prunequal = (List *) adjust_appendrel_attrs(root, (Node *)
+															prunequal,
+															nappinfos,
+															appinfos);
+
+				pfree(appinfos);
+			}
+
 			partprunequal = prunequal;
 		}
 		else
@@ -288,19 +437,7 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 		pruning_steps = gen_partprune_steps(subpart, partprunequal,
 											&contradictory);
 
-		if (contradictory)
-		{
-			/*
-			 * This shouldn't happen as the planner should have detected this
-			 * earlier. However, we do use additional quals from parameterized
-			 * paths here. These do only compare Params to the partition key,
-			 * so this shouldn't cause the discovery of any new qual
-			 * contradictions that were not previously discovered as the Param
-			 * values are unknown during planning.  Anyway, we'd better do
-			 * something sane here, so let's just disable run-time pruning.
-			 */
-			return NIL;
-		}
+		hascontradictingquals |= contradictory;
 
 		/*
 		 * Construct the subplan and subpart maps for this partitioning level.
@@ -320,32 +457,56 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 
 			subplan_map[i] = subplanidx;
 			subpart_map[i] = subpartidx;
-			if (subplanidx >= 0 || subpartidx >= 0)
+			if (subplanidx >= 0)
+			{
+				present_parts = bms_add_member(present_parts, i);
+
+				/* Record finding this subplan  */
+				subplansfound = bms_add_member(subplansfound, subplanidx);
+			}
+			else if (subpartidx >= 0)
 				present_parts = bms_add_member(present_parts, i);
 		}
 
+
 		rte = root->simple_rte_array[subpart->relid];
 
-		pinfo = makeNode(PartitionPruneInfo);
-		pinfo->reloid = rte->relid;
-		pinfo->pruning_steps = pruning_steps;
-		pinfo->present_parts = present_parts;
-		pinfo->nparts = nparts;
-		pinfo->subplan_map = subplan_map;
-		pinfo->subpart_map = subpart_map;
+		prelinfo = makeNode(PartitionedRelPruneInfo);
+		prelinfo->reloid = rte->relid;
+		prelinfo->pruning_steps = pruning_steps;
+		prelinfo->present_parts = present_parts;
+		prelinfo->nparts = nparts;
+		prelinfo->subplan_map = subplan_map;
+		prelinfo->subpart_map = subpart_map;
 
 		/* Determine which pruning types should be enabled at this level */
-		doruntimeprune |= analyze_partkey_exprs(pinfo, pruning_steps,
+		doruntimeprune |= analyze_partkey_exprs(prelinfo, pruning_steps,
 												partnatts);
 
-		pinfolist = lappend(pinfolist, pinfo);
+		prelinfolist = lappend(prelinfolist, prelinfo);
 	}
 
-	pfree(relid_subplan_map);
 	pfree(relid_subpart_map);
 
+	*matchedsubplans = subplansfound;
+
+	if (hascontradictingquals)
+	{
+		/*
+		 * This shouldn't happen as the planner should have detected this
+		 * earlier. However, we do use additional quals from parameterized
+		 * paths here. These do only compare Params to the partition key, so
+		 * this shouldn't cause the discovery of any new qual contradictions
+		 * that were not previously discovered as the Param values are unknown
+		 * during planning.  Anyway, we'd better do something sane here, so
+		 * let's just disable run-time pruning.
+		 */
+		return NIL;
+	}
+
+
 	if (doruntimeprune)
-		return pinfolist;
+		return prelinfolist;
 
 	/* No run-time pruning required. */
 	return NIL;
@@ -2752,10 +2913,11 @@ pull_exec_paramids_walker(Node *node, Bitmapset **context)
  *		executor startup-time or executor run-time pruning.
  *
  * Returns true if any executor partition pruning should be attempted at this
- * level.  Also fills fields of *pinfo to record how to process each step.
+ * level.  Also fills fields of *prelinfo to record how to process each step.
  */
 static bool
-analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps, int partnatts)
+analyze_partkey_exprs(PartitionedRelPruneInfo *prelinfo, List *steps,
+					  int partnatts)
 {
 	bool		doruntimeprune = false;
 	ListCell   *lc;
@@ -2765,11 +2927,12 @@ analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps, int partnatts)
 	 * Otherwise, if their expressions aren't simple Consts, they require
 	 * startup-time pruning.
 	 */
-	pinfo->nexprs = list_length(steps) * partnatts;
-	pinfo->hasexecparam = (bool *) palloc0(sizeof(bool) * pinfo->nexprs);
-	pinfo->do_initial_prune = false;
-	pinfo->do_exec_prune = false;
-	pinfo->execparamids = NULL;
+	prelinfo->nexprs = list_length(steps) * partnatts;
+	prelinfo->hasexecparam = (bool *) palloc0(sizeof(bool) *
+											  prelinfo->nexprs);
+	prelinfo->do_initial_prune = false;
+	prelinfo->do_exec_prune = false;
+	prelinfo->execparamids = NULL;
 
 	foreach(lc, steps)
 	{
@@ -2793,16 +2956,16 @@ analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps, int partnatts)
 														step->step.step_id,
 														keyno);
 
-				Assert(stateidx < pinfo->nexprs);
+				Assert(stateidx < prelinfo->nexprs);
 				hasexecparams = !bms_is_empty(execparamids);
-				pinfo->hasexecparam[stateidx] = hasexecparams;
-				pinfo->execparamids = bms_join(pinfo->execparamids,
-											   execparamids);
+				prelinfo->hasexecparam[stateidx] = hasexecparams;
+				prelinfo->execparamids = bms_join(prelinfo->execparamids,
+												  execparamids);
 
 				if (hasexecparams)
-					pinfo->do_exec_prune = true;
+					prelinfo->do_exec_prune = true;
 				else
-					pinfo->do_initial_prune = true;
+					prelinfo->do_initial_prune = true;
 
 				doruntimeprune = true;
 			}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 862bf65060..c1bee3dd31 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -113,13 +113,13 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 /*-----------------------
- * PartitionPruningData - Per-partitioned-table data for run-time pruning
+ * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
  * of partitions.  For a multilevel partitioned table, we have one of these
  * for the topmost partition plus one for each non-leaf child partition,
  * ordered such that parents appear before their children.
  *
  * subplan_map[] and subpart_map[] have the same definitions as in
- * PartitionPruneInfo (see plannodes.h); though note that here,
+ * PartitionedRelPruneInfo (see plannodes.h); though note that here,
  * subpart_map contains indexes into PartitionPruneState.partprunedata[].
  *
  * subplan_map					Subplan index by partition index, or -1.
@@ -136,7 +136,7 @@ typedef struct PartitionTupleRouting
  *								executor run (for this partitioning level).
  *-----------------------
  */
-typedef struct PartitionPruningData
+typedef struct PartitionedRelPruningData
 {
 	int		   *subplan_map;
 	int		   *subpart_map;
@@ -145,6 +145,17 @@ typedef struct PartitionPruningData
 	List	   *pruning_steps;
 	bool		do_initial_prune;
 	bool		do_exec_prune;
+} PartitionedRelPruningData;
+
+/*
+ * PartitionPruningData - Encapsulates an array of PartitionedRelPruningData
+ * which belong to a single partition hierarchy containing 1 or more
+ * partitions.
+ */
+typedef struct PartitionPruningData
+{
+	PartitionedRelPruningData *partrelprunedata;
+	int			num_partrelprunedata;
 } PartitionPruningData;
 
 /*-----------------------
@@ -170,6 +181,11 @@ typedef struct PartitionPruningData
  *						any of the partprunedata structs.  Pruning must be
  *						done again each time the value of one of these
  *						parameters changes.
+ * other_subplans		Contains subplan indexes which don't belong to any
+ *						'partprunedata', e.g UNION ALL children that are not
+ *						partitioned tables or a partitioned table that the
+ *						planner deemed run-time pruning to be useless for.
+ *						These must not be pruned.
  * prune_context		A short-lived memory context in which to execute the
  *						partition pruning functions.
  *-----------------------
@@ -181,6 +197,7 @@ typedef struct PartitionPruneState
 	bool		do_initial_prune;
 	bool		do_exec_prune;
 	Bitmapset  *execparamids;
+	Bitmapset  *other_subplans;
 	MemoryContext prune_context;
 } PartitionPruneState;
 
@@ -209,7 +226,7 @@ extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 						PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
-							  List *partitionpruneinfo);
+							  PartitionPruneInfo *partitionpruneinfo);
 extern void ExecDestroyPartitionPruneState(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 43f1552241..697d3d7a5f 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -88,6 +88,7 @@ typedef enum NodeTag
 	T_NestLoopParam,
 	T_PlanRowMark,
 	T_PartitionPruneInfo,
+	T_PartitionedRelPruneInfo,
 	T_PartitionPruneStepOp,
 	T_PartitionPruneStepCombine,
 	T_PlanInvalItem,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 5201c6d4bc..b341aa7f35 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -241,6 +241,8 @@ typedef struct ModifyTable
 	List	   *exclRelTlist;	/* tlist of the EXCLUDED pseudo relation */
 } ModifyTable;
 
+struct PartitionPruneInfo;
+
 /* ----------------
  *	 Append node -
  *		Generate the concatenation of the results of sub-plans.
@@ -260,8 +262,8 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 
-	/* Info for run-time subplan pruning, one entry per partitioned_rels */
-	List	   *part_prune_infos;	/* List of PartitionPruneInfo */
+	/* Info for run-time subplan pruning */
+	struct PartitionPruneInfo *part_prune_info;
 } Append;
 
 /* ----------------
@@ -1051,18 +1053,34 @@ typedef struct PlanRowMark
  */
 
 /*
- * PartitionPruneInfo - Details required to allow the executor to prune
+ * PartitionPruneInfo-  - Details required to allow the executor to prune
  * partitions.
  *
+ * prune_infos			List of Lists containing PartitionedRelPruneInfo
+ * other_subplans		Indexes of any subplans which are not accounted for
+ *						by any of the PartitionedRelPruneInfo stored in
+ *						'prune_infos'.
+ */
+typedef struct PartitionPruneInfo
+{
+	NodeTag		type;
+	List	   *prune_infos;
+	Bitmapset  *other_subplans;
+} PartitionPruneInfo;
+
+/*
+ * PartitionedRelPruneInfo - Details required to allow the executor to prune
+ * partitions for a single partitioned table.
+ *
  * Here we store mapping details to allow translation of a partitioned table's
  * index as returned by the partition pruning code into subplan indexes for
  * plan types which support arbitrary numbers of subplans, such as Append.
  * We also store various details to tell the executor when it should be
  * performing partition pruning.
  *
- * Each PartitionPruneInfo describes the partitioning rules for a single
+ * Each PartitionedRelPruneInfo describes the partitioning rules for a single
  * partitioned table (a/k/a level of partitioning).  For a multilevel
- * partitioned table, we have a List of PartitionPruneInfos, where the
+ * partitioned table, we have a List of PartitionedRelPruneInfo, where the
  * first entry represents the topmost partitioned table and additional
  * entries represent non-leaf child partitions, ordered such that parents
  * appear before their children.
@@ -1073,11 +1091,12 @@ typedef struct PlanRowMark
  * zero-based index of the partition's subplan in the parent plan's subplan
  * list; it is -1 if the partition is non-leaf or has been pruned.  For a
  * non-leaf partition p, subpart_map[p] contains the zero-based index of
- * that sub-partition's PartitionPruneInfo in the plan's PartitionPruneInfo
- * list; it is -1 if the partition is a leaf or has been pruned.  All these
- * indexes are global across the whole partitioned table and Append plan node.
+ * that sub-partition's PartitionedRelPruneInfo in the plan's
+ * PartitionedRelPruneInfo list; it is -1 if the partition is a leaf or has
+ * been pruned.  All these indexes are global across the whole partitioned
+ * table and the parenting plan node.
  */
-typedef struct PartitionPruneInfo
+typedef struct PartitionedRelPruneInfo
 {
 	NodeTag		type;
 	Oid			reloid;			/* OID of partition rel for this level */
@@ -1095,7 +1114,7 @@ typedef struct PartitionPruneInfo
 	bool		do_exec_prune;	/* true if pruning should be performed during
 								 * executor run. */
 	Bitmapset  *execparamids;	/* All PARAM_EXEC Param IDs in pruning_steps */
-} PartitionPruneInfo;
+} PartitionedRelPruneInfo;
 
 /*
  * Abstract Node type for partition pruning steps (there are no concrete
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 9944d2832f..df3bcb737d 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -74,7 +74,8 @@ typedef struct PartitionPruneContext
 #define PruneCxtStateIdx(partnatts, step_id, keyno) \
 	((partnatts) * (step_id) + (keyno))
 
-extern List *make_partition_pruneinfo(PlannerInfo *root,
+extern PartitionPruneInfo *make_partition_pruneinfo(PlannerInfo *root,
+						 RelOptInfo *parentrel,
 						 List *partitioned_rels,
 						 List *subpaths, List *prunequal);
 extern Relids prune_append_rel_partitions(RelOptInfo *rel);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 9059147e17..65b979eb6a 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2326,6 +2326,96 @@ select * from ab where a = (select max(a) from lprt_a) and b = (select max(a)-1
                Index Cond: (a = $0)
 (52 rows)
 
+-- Test run-time partition pruning with UNION ALL parents
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from ab where a = 1 union all select * from ab) ab where b = (select 1);
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   InitPlan 1 (returns $0)
+     ->  Result (actual rows=1 loops=1)
+   ->  Append (actual rows=0 loops=1)
+         ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_b1_1 (actual rows=0 loops=1)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                     Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b2 ab_a1_b2_1 (never executed)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+                     Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b3 ab_a1_b3_1 (never executed)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+                     Index Cond: (a = 1)
+   ->  Seq Scan on ab_a1_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a1_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a1_b3 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b3 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b3 (never executed)
+         Filter: (b = $0)
+(37 rows)
+
+-- A case containing a UNION ALL with a non-partitioned child.
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from ab where a = 1 union all (values(10,5)) union all select * from ab) ab where b = (select 1);
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   InitPlan 1 (returns $0)
+     ->  Result (actual rows=1 loops=1)
+   ->  Append (actual rows=0 loops=1)
+         ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_b1_1 (actual rows=0 loops=1)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                     Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b2 ab_a1_b2_1 (never executed)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+                     Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b3 ab_a1_b3_1 (never executed)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+                     Index Cond: (a = 1)
+   ->  Result (actual rows=0 loops=1)
+         One-Time Filter: (5 = $0)
+   ->  Seq Scan on ab_a1_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a1_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a1_b3 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b3 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b3 (never executed)
+         Filter: (b = $0)
+(39 rows)
+
 deallocate ab_q1;
 deallocate ab_q2;
 deallocate ab_q3;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 11b92bfada..c5203f1c95 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -526,6 +526,14 @@ reset max_parallel_workers_per_gather;
 explain (analyze, costs off, summary off, timing off)
 select * from ab where a = (select max(a) from lprt_a) and b = (select max(a)-1 from lprt_a);
 
+-- Test run-time partition pruning with UNION ALL parents
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from ab where a = 1 union all select * from ab) ab where b = (select 1);
+
+-- A case containing a UNION ALL with a non-partitioned child.
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from ab where a = 1 union all (values(10,5)) union all select * from ab) ab where b = (select 1);
+
 deallocate ab_q1;
 deallocate ab_q2;
 deallocate ab_q3;
-- 
2.16.2.windows.1

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Rowley (#15)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2018-Jun-27, David Rowley wrote:

I've only just completed reading back through all the code and I think
it's correct. I ended up changing add_paths_to_append_rel() so that
instead of performing concatenation on partitioned_rels from two UNION
ALL children, it creates a List of lists. I believe this list can
only end up with a 2-level hierarchy of partitioned rels. I tested
this and it seems to work as I expect, although I think I need to
study the code a bit more to ensure it can't happen. I need to check
if there's some cases where nested UNION ALLs cannot be flattened to
have a single UNION ALL parent. Supporting this did cause me to have
to check the List->type in a few places. I only saw one other place in
the code where this is done, so I figured that meant it was okay.

Checking a node's ->type member is not idiomatic -- use IsA() for that.
(Strangely, we have IsIntegerList() but only for use within list.c.) But
do we rely on the ordering of these lists anywhere? I'm wondering why
it's not more sensible to use a bitmapset there (I guess for your
list-of-lists business you'd have a list of bms's).

I didn't look your patch much further.

Since Tom has been revamping this code lately, I think it's a good
idea to wait for his input.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#16)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Since Tom has been revamping this code lately, I think it's a good
idea to wait for his input.

I'm on vacation and won't have time to look at this until week after
next. If you don't mind putting the topic on hold that long, I'll
be happy to take responsibility for it.

regards, tom lane

#18Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#17)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

Hi Tom,

On 2018-06-29 18:17:08 -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Since Tom has been revamping this code lately, I think it's a good
idea to wait for his input.

I'm on vacation and won't have time to look at this until week after
next. If you don't mind putting the topic on hold that long, I'll
be happy to take responsibility for it.

Is that still the plan? Do you forsee any issues here? This has been
somewhat of a longstanding open item...

Greetings,

Andres Freund

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#18)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

Andres Freund <andres@anarazel.de> writes:

On 2018-06-29 18:17:08 -0400, Tom Lane wrote:

I'm on vacation and won't have time to look at this until week after
next. If you don't mind putting the topic on hold that long, I'll
be happy to take responsibility for it.

Is that still the plan? Do you forsee any issues here? This has been
somewhat of a longstanding open item...

It's on my to-do list, but I'm still catching up vacation backlog.
Is this item blocking anyone?

regards, tom lane

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#19)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2018-Jul-12, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-06-29 18:17:08 -0400, Tom Lane wrote:

I'm on vacation and won't have time to look at this until week after
next. If you don't mind putting the topic on hold that long, I'll
be happy to take responsibility for it.

Is that still the plan? Do you forsee any issues here? This has been
somewhat of a longstanding open item...

It's on my to-do list, but I'm still catching up vacation backlog.
Is this item blocking anyone?

I don't think so. The patch might conflict with other fixes, but I
suppose that's a fact of life.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#15)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

David Rowley <david.rowley@2ndquadrant.com> writes:

Anyway... Patch attached. This is method 3 of the 3 methods I thought
to fix this, so if this is not suitable then I'm out of ideas.

I started to look at this patch. I think this is basically the right
direction to go in, but I'm not terribly happy with the details of the
data structure design.

First off, given that we're now going to have a Plan data structure
that accurately reflects the partition hierarchy, I wonder whether we
couldn't get rid of the fiddling around with lists of ints and lists of
lists of ints; aren't those basically duplicative? They'd certainly be
so if we add the rel's RT index to PartitionedRelPruneInfo, but maybe
they are even without that. Alvaro complained about the code associated
with those lists not being very idiomatic, but I'd like to just get rid
of the lists entirely. I especially don't care for keeping the implicit
ordering assumptions in those lists (parents before children) when the
same info is now going to be explicit in a nearby structure. (This
ties into the remarks I just made to Amit about getting rid of
executor-startup lock-taking logic. Most of that change only makes
sense to do in v12, but I'd prefer that this patch not double down on
reliance on data structures we'd like to get rid of.)

Second, I still feel that you've got the sense of the prunable-subplans
bitmaps backwards, and I do not buy the micro-optimization argument you
made for doing it like that. It complicates the code, yet the cost of
one bit in a bitmap is completely negligible compared to all the other
costs involved in having a per-partition subplan, whether or not that
subplan actually gets used in a particular run.

One very minor quibble is that I'd be inclined to represent the
PartitionPruningData struct like this:

typedef struct PartitionPruningData
{
int num_partrelprunedata;
PartitionedRelPruningData partrelprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruningData;

so we can allocate it with one palloc not two. Also, in the Plan
representation, I'd suggest avoiding situations where a data structure
is sometimes a List of Lists of PartitionedRelPruneInfo and sometimes
just a single-level List. Saving one ListCell isn't worth the code
complexity and error-proneness of that.

regards, tom lane

#22David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#21)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 16 July 2018 at 06:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

First off, given that we're now going to have a Plan data structure
that accurately reflects the partition hierarchy, I wonder whether we
couldn't get rid of the fiddling around with lists of ints and lists of
lists of ints; aren't those basically duplicative?

I'm a bit confused by this. I get that you're talking about the
partitioned_rels List, but without that list then how would
make_partition_pruneinfo() know what the subnode's parents are?
Perhaps we could add a relid field in RelOptInfo to mark the direct
parent of a but that does not make getting a unique list very quick
when given a List of subplans. A Bitmapset could be used, but we'll
end up with a mixed hierarchy with UNION ALL parents. Unsure how to
separate those again without some complex processing.

I'm not objecting to improving this as I don't really like that list,
but I just can't quite think of how else to represent the partition
hierarchy.

They'd certainly be
so if we add the rel's RT index to PartitionedRelPruneInfo, but maybe
they are even without that. Alvaro complained about the code associated
with those lists not being very idiomatic, but I'd like to just get rid
of the lists entirely. I especially don't care for keeping the implicit
ordering assumptions in those lists (parents before children) when the
same info is now going to be explicit in a nearby structure. (This
ties into the remarks I just made to Amit about getting rid of
executor-startup lock-taking logic. Most of that change only makes
sense to do in v12, but I'd prefer that this patch not double down on
reliance on data structures we'd like to get rid of.)

Right, but I need to use something for v11. Do you want to see two
separate patches here? If we're not going to change this in v11 then
I still need to use something to describe the partition hierarchy so
that make_partition_pruneinfo() knows how to build the data structures
for run-time pruning.

Second, I still feel that you've got the sense of the prunable-subplans
bitmaps backwards, and I do not buy the micro-optimization argument you
made for doing it like that. It complicates the code, yet the cost of
one bit in a bitmap is completely negligible compared to all the other
costs involved in having a per-partition subplan, whether or not that
subplan actually gets used in a particular run.

But get_matching_partitions() returns the set of matching partitions,
not the set that does not match. It sounds like doing it the way you
ask would require inverting the Bitmapset returned by that. I don't
quite understand why you think this is more simple to implement. I
can't see how we'd map the non-matching partitions into subplan
indexes for the Append node. Can you give a bit more detail on your
design for this?

One very minor quibble is that I'd be inclined to represent the
PartitionPruningData struct like this:

typedef struct PartitionPruningData
{
int num_partrelprunedata;
PartitionedRelPruningData partrelprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruningData;

so we can allocate it with one palloc not two.

That could be done, sort of. Only I currently allocate the array which
stores these PartitionPruningDatas as one chunk of memory. I can do
that today because the PartitionPruningData struct is a fixed size. If
you want to make it have a flexible size then I'd need to allocate an
array of pointers in the PartitionPruningState... or use a
FLEXIBLE_ARRAY_MEMBER of pointers there too. I've done it that way
locally for now.

Also, in the Plan
representation, I'd suggest avoiding situations where a data structure
is sometimes a List of Lists of PartitionedRelPruneInfo and sometimes
just a single-level List. Saving one ListCell isn't worth the code
complexity and error-proneness of that.

I'll make that change. But I'm confused; was the first thing you
mentioned above not to get rid of this list?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#23David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#21)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 16 July 2018 at 06:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, in the Plan
representation, I'd suggest avoiding situations where a data structure
is sometimes a List of Lists of PartitionedRelPruneInfo and sometimes
just a single-level List. Saving one ListCell isn't worth the code
complexity and error-proneness of that.

Thinking about this some more, I don't quite see any reason that the
partitioned_rels for a single hierarchy couldn't just be a Bitmapset
instead of an IntList.

A parent partition is always going to have a lower relid than its
children, so that means that the top level parent will just have the
lowest member in the set.

There's already code in the inheritance_planner which rebuilds the
IntList from a Bitmapset:

while ((i = bms_next_member(partitioned_relids, i)) >= 0)
partitioned_rels = lappend_int(partitioned_rels, i);

ExecLockNonLeafAppendTables could be made to accept a Bitmapset rather
than a List. In fact, we could probably get rid of the nested loops if
we did it that way.

What do you think?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#24David Rowley
david.rowley@2ndquadrant.com
In reply to: David Rowley (#23)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 16 July 2018 at 12:55, David Rowley <david.rowley@2ndquadrant.com> wrote:

Thinking about this some more, I don't quite see any reason that the
partitioned_rels for a single hierarchy couldn't just be a Bitmapset
instead of an IntList.

Of course, this is not possible since we can't pass a List of
Bitmapsets to the executor due to Bitmapset not being a node type.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#25Phil Florent
philflorent@hotmail.com
In reply to: David Rowley (#23)
RE: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

Hi,

I should post that in the general section but I am confused by the sentence "A parent partition is always going to have a lower relid than its children"

In the code, relid has many meanings and not only "oid of a class" as in /messages/by-id/20910.1111734343@sss.pgh.pa.us ?
PostgreSQL: Re: relid and relname</messages/by-id/20910.1111734343@sss.pgh.pa.us&gt;
www.postgresql.org
Michael Fuhr <mike(at)fuhr(dot)org> writes: > On Thu, Mar 24, 2005 at 11:01:23PM -0300, Edson Vilhena de Carvalho wrote: >> Can anyone tell me what is a relid, a relname and

In fact, I want to be sure I can say to the developers they will always be able to create tables and partitions in any order :

create table child1(c1 int, c2 int);

create table midparent1(c1 int, c2 int) partition by list(c2);

alter table midparent1 attach partition child1 for values in (1);

create table child2 partition of midparent1 for values in (2);

create table topparent(c1 int, c2 int) partition by list(c1);

alter table topparent attach partition midparent1 for values in (1);

select relname, relkind, oid from pg_class where relname in ('topparent', 'midparent1', 'child1', 'child2') order by oid asc;

relname | relkind | oid
------------+---------+--------
child1 | r | 123989
midparent1 | p | 123992
child2 | r | 123995
topparent | p | 123998
(4 lignes)

Regards
Phil

#26David Rowley
david.rowley@2ndquadrant.com
In reply to: Phil Florent (#25)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 16 July 2018 at 16:56, Phil Florent <philflorent@hotmail.com> wrote:

I should post that in the general section but I am confused by the
sentence "A parent partition is always going to have a lower relid than
its children"

It's a little confusing since RelOptInfo has a relid field and so does
RangeTblEntry. They both have completely different meanings. RelOptInfo's
relid is a number starting at 1 and continues in a gapless sequence
increasing by 1 with each RelOptInfo. These relids are completely internal
to the server and don't appear in the system catalog tables.
RangeTblEntry's relid is what's in pg_class.oid.

I was talking about RelOptInfo's relid.

Using relids starting at 1 is quite convenient for allowing direct array
lookups in various data structures in the planner. However it's also
required to uniquely identify a relation as a single table may appear many
times in a query, so trying to identify them by their oid could be
ambiguous. Also, some RTEKinds don't have storage, e.g a VALUES() clause.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#27Phil Florent
philflorent@hotmail.com
In reply to: David Rowley (#26)
RE: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

I get it. Thank you for this precision.

Regards

Phil

________________________________
De : David Rowley <david.rowley@2ndquadrant.com>
Envoyé : lundi 16 juillet 2018 07:48
À : Phil Florent
Cc : Tom Lane; Robert Haas; Amit Langote; PostgreSQL Hackers
Objet : Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 16 July 2018 at 16:56, Phil Florent <philflorent@hotmail.com<mailto:philflorent@hotmail.com>> wrote:

I should post that in the general section but I am confused by the sentence "A parent partition is always going to have a lower relid than its children"

It's a little confusing since RelOptInfo has a relid field and so does RangeTblEntry. They both have completely different meanings. RelOptInfo's relid is a number starting at 1 and continues in a gapless sequence increasing by 1 with each RelOptInfo. These relids are completely internal to the server and don't appear in the system catalog tables. RangeTblEntry's relid is what's in pg_class.oid.

I was talking about RelOptInfo's relid.

Using relids starting at 1 is quite convenient for allowing direct array lookups in various data structures in the planner. However it's also required to uniquely identify a relation as a single table may appear many times in a query, so trying to identify them by their oid could be ambiguous. Also, some RTEKinds don't have storage, e.g a VALUES() clause.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#28David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#21)
1 attachment(s)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 16 July 2018 at 06:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I started to look at this patch. I think this is basically the right
direction to go in, but I'm not terribly happy with the details of the
data structure design.

I've made an attempt at addressing the issues that I understood.

I've not done anything about your Bitmapset for non-matching
partitions. I fail to see how that would improve the code. But please
feel free to provide details of your design and I'll review it and let
you know what I think about it.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

v2-0001-Fix-run-time-partition-pruning-for-UNION-ALL-pare.patchapplication/octet-stream; name=v2-0001-Fix-run-time-partition-pruning-for-UNION-ALL-pare.patchDownload
From 85972785b6ccb0e1307f38eadbf19121aaed0c9d Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 26 Jun 2018 23:49:31 +1200
Subject: [PATCH v2] Fix run-time partition pruning for UNION ALL parents

The run-time partition pruning code added in 499be013d was unaware that the
partition_rels list that's built during add_paths_to_append_rel() could be
non-empty for relations other than just partitioned relations.  It can also
be set for UNION ALL parents where one or more union children are
partitioned tables.  This can cause the partitioned_rels list to end up
with the partition relids from multiple partition hierarchies to become
mixed.

This commit resolved that issue by never mixing the relids from different
UNION ALL children.  Instead we maintain a List of Lists containing the
partitioned relids.  This commit also adds all the new required code in
both the planner and executor to allow run-time pruning to work for UNION
ALL parents which query multiple partitioned tables.
---
 src/backend/executor/execPartition.c          | 394 +++++++++++++++-----------
 src/backend/executor/nodeAppend.c             |   4 +-
 src/backend/nodes/copyfuncs.c                 |  16 +-
 src/backend/nodes/outfuncs.c                  |  16 +-
 src/backend/nodes/readfuncs.c                 |  15 +-
 src/backend/optimizer/path/allpaths.c         |  27 +-
 src/backend/optimizer/plan/createplan.c       |  51 +++-
 src/backend/optimizer/plan/planner.c          |   1 +
 src/backend/partitioning/partprune.c          | 249 ++++++++++++----
 src/include/executor/execPartition.h          |  33 ++-
 src/include/nodes/nodes.h                     |   1 +
 src/include/nodes/plannodes.h                 |  39 ++-
 src/include/partitioning/partprune.h          |   3 +-
 src/test/regress/expected/partition_prune.out |  90 ++++++
 src/test/regress/sql/partition_prune.sql      |   8 +
 15 files changed, 672 insertions(+), 275 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7a4665cc4e..dac789d414 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -48,8 +48,8 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 									 bool *isnull,
 									 int maxfieldlen);
 static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
-static void find_matching_subplans_recurse(PartitionPruneState *prunestate,
-							   PartitionPruningData *pprune,
+static void find_matching_subplans_recurse(PartitionPruningData *pprune,
+							   PartitionedRelPruningData *prelprune,
 							   bool initial_prune,
 							   Bitmapset **validsubplans);
 
@@ -1394,34 +1394,42 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  *
  * 'planstate' is the parent plan node's execution state.
  *
- * 'partitionpruneinfo' is a List of PartitionPruneInfos as generated by
+ * 'partitionpruneinfo' is a PartitionPruneInfo as generated by
  * make_partition_pruneinfo.  Here we build a PartitionPruneState containing a
- * PartitionPruningData for each item in that List.  This data can be re-used
- * each time we re-evaluate which partitions match the pruning steps provided
- * in each PartitionPruneInfo.
+ * PartitionPruningData for each partitionpruneinfo->prune_infos, in
+ * turn, a PartitionedRelPruningData is created for each
+ * PartitionedRelPruneInfo stored in each 'prune_infos'.  This two-level system
+ * is required in order to support run-time pruning with UNION ALL parents
+ * containing one or more partitioned tables as children.  The data stored in
+ * each PartitionedRelPruningData can be re-used each time we re-evaluate
+ * which partitions match the pruning steps provided in each
+ * PartitionedRelPruneInfo.
  */
 PartitionPruneState *
-ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
+ExecCreatePartitionPruneState(PlanState *planstate,
+							  PartitionPruneInfo *partitionpruneinfo)
 {
 	PartitionPruneState *prunestate;
-	PartitionPruningData *prunedata;
 	ListCell   *lc;
+	int			n_part_hierarchies;
 	int			i;
 
-	Assert(partitionpruneinfo != NIL);
+	Assert(partitionpruneinfo != NULL);
+
+	n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
 
 	/*
 	 * Allocate the data structure
 	 */
-	prunestate = (PartitionPruneState *) palloc(sizeof(PartitionPruneState));
-	prunedata = (PartitionPruningData *)
-		palloc(sizeof(PartitionPruningData) * list_length(partitionpruneinfo));
+	prunestate = (PartitionPruneState *)
+		palloc(offsetof(PartitionPruneState, partprunedata) +
+			   sizeof(PartitionPruningData *) * n_part_hierarchies);
 
-	prunestate->partprunedata = prunedata;
-	prunestate->num_partprunedata = list_length(partitionpruneinfo);
+	prunestate->num_partprunedata = n_part_hierarchies;
 	prunestate->do_initial_prune = false;	/* may be set below */
 	prunestate->do_exec_prune = false;	/* may be set below */
 	prunestate->execparamids = NULL;
+	prunestate->other_subplans = bms_copy(partitionpruneinfo->other_subplans);
 
 	/*
 	 * Create a short-term memory context which we'll use when making calls to
@@ -1435,113 +1443,129 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 							  ALLOCSET_DEFAULT_SIZES);
 
 	i = 0;
-	foreach(lc, partitionpruneinfo)
+	foreach(lc, partitionpruneinfo->prune_infos)
 	{
-		PartitionPruneInfo *pinfo = castNode(PartitionPruneInfo, lfirst(lc));
-		PartitionPruningData *pprune = &prunedata[i];
-		PartitionPruneContext *context = &pprune->context;
-		PartitionDesc partdesc;
-		PartitionKey partkey;
-		int			partnatts;
-		int			n_steps;
 		ListCell   *lc2;
-
-		/*
-		 * We must copy the subplan_map rather than pointing directly to the
-		 * plan's version, as we may end up making modifications to it later.
-		 */
-		pprune->subplan_map = palloc(sizeof(int) * pinfo->nparts);
-		memcpy(pprune->subplan_map, pinfo->subplan_map,
-			   sizeof(int) * pinfo->nparts);
-
-		/* We can use the subpart_map verbatim, since we never modify it */
-		pprune->subpart_map = pinfo->subpart_map;
-
-		/* present_parts is also subject to later modification */
-		pprune->present_parts = bms_copy(pinfo->present_parts);
-
-		/*
-		 * We need to hold a pin on the partitioned table's relcache entry so
-		 * that we can rely on its copies of the table's partition key and
-		 * partition descriptor.  We need not get a lock though; one should
-		 * have been acquired already by InitPlan or
-		 * ExecLockNonLeafAppendTables.
-		 */
-		context->partrel = relation_open(pinfo->reloid, NoLock);
-
-		partkey = RelationGetPartitionKey(context->partrel);
-		partdesc = RelationGetPartitionDesc(context->partrel);
-		n_steps = list_length(pinfo->pruning_steps);
-
-		context->strategy = partkey->strategy;
-		context->partnatts = partnatts = partkey->partnatts;
-		context->nparts = pinfo->nparts;
-		context->boundinfo = partdesc->boundinfo;
-		context->partcollation = partkey->partcollation;
-		context->partsupfunc = partkey->partsupfunc;
-
-		/* We'll look up type-specific support functions as needed */
-		context->stepcmpfuncs = (FmgrInfo *)
-			palloc0(sizeof(FmgrInfo) * n_steps * partnatts);
-
-		context->ppccontext = CurrentMemoryContext;
-		context->planstate = planstate;
-
-		/* Initialize expression state for each expression we need */
-		context->exprstates = (ExprState **)
-			palloc0(sizeof(ExprState *) * n_steps * partnatts);
-		foreach(lc2, pinfo->pruning_steps)
+		List	   *partrelpruneinfos = lfirst_node(List, lc);
+		PartitionPruningData *prunedata;
+		int			npartrelpruneinfos = list_length(partrelpruneinfos);
+		int			j;
+
+		prunedata = palloc(offsetof(PartitionPruningData, partrelprunedata) +
+						   npartrelpruneinfos * sizeof(PartitionedRelPruningData));
+		prunestate->partprunedata[i] = prunedata;
+		prunedata->num_partrelprunedata = npartrelpruneinfos;
+
+		j = 0;
+		foreach(lc2, partrelpruneinfos)
 		{
-			PartitionPruneStepOp *step = (PartitionPruneStepOp *) lfirst(lc2);
+			PartitionedRelPruneInfo *pinfo = castNode(PartitionedRelPruneInfo, lfirst(lc2));
+			PartitionedRelPruningData *prelprune = &prunedata->partrelprunedata[j];
+			PartitionPruneContext *context = &prelprune->context;
+			PartitionDesc partdesc;
+			PartitionKey partkey;
+			int			partnatts;
+			int			n_steps;
 			ListCell   *lc3;
-			int			keyno;
 
-			/* not needed for other step kinds */
-			if (!IsA(step, PartitionPruneStepOp))
-				continue;
+			/*
+			 * We must copy the subplan_map rather than pointing directly to
+			 * the plan's version, as we may end up making modifications to it
+			 * later.
+			 */
+			prelprune->subplan_map = palloc(sizeof(int) * pinfo->nparts);
+			memcpy(prelprune->subplan_map, pinfo->subplan_map,
+				   sizeof(int) * pinfo->nparts);
 
-			Assert(list_length(step->exprs) <= partnatts);
+			/* We can use the subpart_map verbatim, since we never modify it */
+			prelprune->subpart_map = pinfo->subpart_map;
 
-			keyno = 0;
-			foreach(lc3, step->exprs)
+			/* present_parts is also subject to later modification */
+			prelprune->present_parts = bms_copy(pinfo->present_parts);
+
+			/*
+			 * We need to hold a pin on the partitioned table's relcache entry
+			 * so that we can rely on its copies of the table's partition key
+			 * and partition descriptor.  We need not get a lock though; one
+			 * should have been acquired already by InitPlan or
+			 * ExecLockNonLeafAppendTables.
+			 */
+			context->partrel = relation_open(pinfo->reloid, NoLock);
+
+			partkey = RelationGetPartitionKey(context->partrel);
+			partdesc = RelationGetPartitionDesc(context->partrel);
+			n_steps = list_length(pinfo->pruning_steps);
+
+			context->strategy = partkey->strategy;
+			context->partnatts = partnatts = partkey->partnatts;
+			context->nparts = pinfo->nparts;
+			context->boundinfo = partdesc->boundinfo;
+			context->partcollation = partkey->partcollation;
+			context->partsupfunc = partkey->partsupfunc;
+
+			/* We'll look up type-specific support functions as needed */
+			context->stepcmpfuncs = (FmgrInfo *)
+				palloc0(sizeof(FmgrInfo) * n_steps * partnatts);
+
+			context->ppccontext = CurrentMemoryContext;
+			context->planstate = planstate;
+
+			/* Initialize expression state for each expression we need */
+			context->exprstates = (ExprState **)
+				palloc0(sizeof(ExprState *) * n_steps * partnatts);
+			foreach(lc3, pinfo->pruning_steps)
 			{
-				Expr	   *expr = (Expr *) lfirst(lc3);
+				PartitionPruneStepOp *step = (PartitionPruneStepOp *) lfirst(lc3);
+				ListCell   *lc4;
+				int			keyno;
 
-				/* not needed for Consts */
-				if (!IsA(expr, Const))
+				/* not needed for other step kinds */
+				if (!IsA(step, PartitionPruneStepOp))
+					continue;
+
+				Assert(list_length(step->exprs) <= partnatts);
+
+				keyno = 0;
+				foreach(lc4, step->exprs)
 				{
-					int			stateidx = PruneCxtStateIdx(partnatts,
-															step->step.step_id,
-															keyno);
+					Expr	   *expr = (Expr *) lfirst(lc4);
+
+					/* not needed for Consts */
+					if (!IsA(expr, Const))
+					{
+						int			stateidx = PruneCxtStateIdx(partnatts,
+																step->step.step_id,
+																keyno);
 
-					context->exprstates[stateidx] =
-						ExecInitExpr(expr, context->planstate);
+						context->exprstates[stateidx] =
+							ExecInitExpr(expr, context->planstate);
+					}
+					keyno++;
 				}
-				keyno++;
 			}
-		}
 
-		/* Array is not modified at runtime, so just point to plan's copy */
-		context->exprhasexecparam = pinfo->hasexecparam;
+			/* Array is not modified at runtime, so just point to plan's copy */
+			context->exprhasexecparam = pinfo->hasexecparam;
 
-		pprune->pruning_steps = pinfo->pruning_steps;
-		pprune->do_initial_prune = pinfo->do_initial_prune;
-		pprune->do_exec_prune = pinfo->do_exec_prune;
+			prelprune->pruning_steps = pinfo->pruning_steps;
+			prelprune->do_initial_prune = pinfo->do_initial_prune;
+			prelprune->do_exec_prune = pinfo->do_exec_prune;
 
-		/* Record if pruning would be useful at any level */
-		prunestate->do_initial_prune |= pinfo->do_initial_prune;
-		prunestate->do_exec_prune |= pinfo->do_exec_prune;
+			/* Record if pruning would be useful at any level */
+			prunestate->do_initial_prune |= pinfo->do_initial_prune;
+			prunestate->do_exec_prune |= pinfo->do_exec_prune;
 
-		/*
-		 * Accumulate the IDs of all PARAM_EXEC Params affecting the
-		 * partitioning decisions at this plan node.
-		 */
-		prunestate->execparamids = bms_add_members(prunestate->execparamids,
-												   pinfo->execparamids);
+			/*
+			 * Accumulate the IDs of all PARAM_EXEC Params affecting the
+			 * partitioning decisions at this plan node.
+			 */
+			prunestate->execparamids = bms_add_members(prunestate->execparamids,
+													   pinfo->execparamids);
 
+			j++;
+		}
 		i++;
 	}
-
 	return prunestate;
 }
 
@@ -1555,13 +1579,17 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 void
 ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
 {
+	PartitionPruningData **partprunedata = prunestate->partprunedata;
 	int			i;
 
 	for (i = 0; i < prunestate->num_partprunedata; i++)
 	{
-		PartitionPruningData *pprune = &prunestate->partprunedata[i];
+		PartitionPruningData *pprune = partprunedata[i];
+		PartitionedRelPruningData *prunedata = pprune->partrelprunedata;
+		int			j;
 
-		relation_close(pprune->context.partrel, NoLock);
+		for (j = 0; j < pprune->num_partrelprunedata; j++)
+			relation_close(prunedata[j].context.partrel, NoLock);
 	}
 }
 
@@ -1581,31 +1609,42 @@ ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
 Bitmapset *
 ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 {
-	PartitionPruningData *pprune;
 	MemoryContext oldcontext;
 	Bitmapset  *result = NULL;
+	int			i;
 
 	Assert(prunestate->do_initial_prune);
 
-	pprune = prunestate->partprunedata;
-
 	/*
 	 * Switch to a temp context to avoid leaking memory in the executor's
 	 * memory context.
 	 */
 	oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
 
-	/* Perform pruning without using PARAM_EXEC Params */
-	find_matching_subplans_recurse(prunestate, pprune, true, &result);
+	for (i = 0; i < prunestate->num_partprunedata; i++)
+	{
+		PartitionPruningData *pprune;
+		PartitionedRelPruningData *prelprune;
+
+		pprune = prunestate->partprunedata[i];
+		prelprune = &pprune->partrelprunedata[0];
+
+		/* Perform pruning without using PARAM_EXEC Params */
+		find_matching_subplans_recurse(pprune, prelprune, true, &result);
+
+		/* Expression eval may have used space in node's ps_ExprContext too */
+		ResetExprContext(prelprune->context.planstate->ps_ExprContext);
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
 	/* Copy result out of the temp context before we reset it */
 	result = bms_copy(result);
 
+	/* Add in any subplans which partition pruning didn't account for */
+	result = bms_add_members(result, prunestate->other_subplans);
+
 	MemoryContextReset(prunestate->prune_context);
-	/* Expression eval may have used space in node's ps_ExprContext too */
-	ResetExprContext(pprune->context.planstate->ps_ExprContext);
 
 	/*
 	 * If any subplans were pruned, we must re-sequence the subplan indexes so
@@ -1633,59 +1672,70 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 		}
 
 		/*
-		 * Now we can update each PartitionPruneInfo's subplan_map with new
-		 * subplan indexes.  We must also recompute its present_parts bitmap.
-		 * We perform this loop in back-to-front order so that we determine
-		 * present_parts for the lowest-level partitioned tables first.  This
-		 * way we can tell whether a sub-partitioned table's partitions were
-		 * entirely pruned so we can exclude that from 'present_parts'.
+		 * Now we can update each PartitionedRelPruneInfo's subplan_map with
+		 * new subplan indexes.  We must also recompute its present_parts
+		 * bitmap. We perform this loop in back-to-front order so that we
+		 * determine present_parts for the lowest-level partitioned tables
+		 * first.  This way we can tell whether a sub-partitioned table's
+		 * partitions were entirely pruned so we can exclude that from
+		 * 'present_parts'.
 		 */
-		for (i = prunestate->num_partprunedata - 1; i >= 0; i--)
+
+		for (i = 0; i < prunestate->num_partprunedata; i++)
 		{
-			int			nparts;
 			int			j;
+			PartitionPruningData *prunedata;
 
-			pprune = &prunestate->partprunedata[i];
-			nparts = pprune->context.nparts;
-			/* We just rebuild present_parts from scratch */
-			bms_free(pprune->present_parts);
-			pprune->present_parts = NULL;
+			prunedata = prunestate->partprunedata[i];
 
-			for (j = 0; j < nparts; j++)
+			for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
 			{
-				int			oldidx = pprune->subplan_map[j];
-				int			subidx;
+				PartitionedRelPruningData *pprune;
+				int			nparts;
+				int			k;
 
-				/*
-				 * If this partition existed as a subplan then change the old
-				 * subplan index to the new subplan index.  The new index may
-				 * become -1 if the partition was pruned above, or it may just
-				 * come earlier in the subplan list due to some subplans being
-				 * removed earlier in the list.  If it's a subpartition, add
-				 * it to present_parts unless it's entirely pruned.
-				 */
-				if (oldidx >= 0)
-				{
-					Assert(oldidx < nsubplans);
-					pprune->subplan_map[j] = new_subplan_indexes[oldidx];
+				pprune = &prunedata->partrelprunedata[j];
+				nparts = pprune->context.nparts;
+				/* We just rebuild present_parts from scratch */
+				bms_free(pprune->present_parts);
+				pprune->present_parts = NULL;
 
-					if (new_subplan_indexes[oldidx] >= 0)
-						pprune->present_parts =
-							bms_add_member(pprune->present_parts, j);
-				}
-				else if ((subidx = pprune->subpart_map[j]) >= 0)
+				for (k = 0; k < nparts; k++)
 				{
-					PartitionPruningData *subprune;
+					int			oldidx = pprune->subplan_map[k];
+					int			subidx;
 
-					subprune = &prunestate->partprunedata[subidx];
+					/*
+					 * If this partition existed as a subplan then change the
+					 * old subplan index to the new subplan index.  The new
+					 * index may become -1 if the partition was pruned above,
+					 * or it may just come earlier in the subplan list due to
+					 * some subplans being removed earlier in the list.  If
+					 * it's a subpartition, add it to present_parts unless
+					 * it's entirely pruned.
+					 */
+					if (oldidx >= 0)
+					{
+						Assert(oldidx < nsubplans);
+						pprune->subplan_map[k] = new_subplan_indexes[oldidx];
 
-					if (!bms_is_empty(subprune->present_parts))
-						pprune->present_parts =
-							bms_add_member(pprune->present_parts, j);
+						if (new_subplan_indexes[oldidx] >= 0)
+							pprune->present_parts =
+								bms_add_member(pprune->present_parts, k);
+					}
+					else if ((subidx = pprune->subpart_map[k]) >= 0)
+					{
+						PartitionedRelPruningData *subprune;
+
+						subprune = &prunedata->partrelprunedata[subidx];
+
+						if (!bms_is_empty(subprune->present_parts))
+							pprune->present_parts =
+								bms_add_member(pprune->present_parts, k);
+					}
 				}
 			}
 		}
-
 		pfree(new_subplan_indexes);
 	}
 
@@ -1702,11 +1752,9 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 Bitmapset *
 ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
 {
-	PartitionPruningData *pprune;
 	MemoryContext oldcontext;
 	Bitmapset  *result = NULL;
-
-	pprune = prunestate->partprunedata;
+	int			i;
 
 	/*
 	 * Switch to a temp context to avoid leaking memory in the executor's
@@ -1714,16 +1762,29 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
 	 */
 	oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
 
-	find_matching_subplans_recurse(prunestate, pprune, false, &result);
+	for (i = 0; i < prunestate->num_partprunedata; i++)
+	{
+		PartitionPruningData *pprune;
+		PartitionedRelPruningData *prelprune;
+
+		pprune = prunestate->partprunedata[i];
+		prelprune = &pprune->partrelprunedata[0];
+
+		find_matching_subplans_recurse(pprune, prelprune, false, &result);
+
+		/* Expression eval may have used space in node's ps_ExprContext too */
+		ResetExprContext(prelprune->context.planstate->ps_ExprContext);
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
 	/* Copy result out of the temp context before we reset it */
 	result = bms_copy(result);
 
+	/* Add in any subplans which partition pruning didn't account for */
+	result = bms_add_members(result, prunestate->other_subplans);
+
 	MemoryContextReset(prunestate->prune_context);
-	/* Expression eval may have used space in node's ps_ExprContext too */
-	ResetExprContext(pprune->context.planstate->ps_ExprContext);
 
 	return result;
 }
@@ -1736,8 +1797,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
  * Adds valid (non-prunable) subplan IDs to *validsubplans
  */
 static void
-find_matching_subplans_recurse(PartitionPruneState *prunestate,
-							   PartitionPruningData *pprune,
+find_matching_subplans_recurse(PartitionPruningData *pprune,
+							   PartitionedRelPruningData *prelprune,
 							   bool initial_prune,
 							   Bitmapset **validsubplans)
 {
@@ -1748,15 +1809,16 @@ find_matching_subplans_recurse(PartitionPruneState *prunestate,
 	check_stack_depth();
 
 	/* Only prune if pruning would be useful at this level. */
-	if (initial_prune ? pprune->do_initial_prune : pprune->do_exec_prune)
+	if (initial_prune ? prelprune->do_initial_prune :
+		prelprune->do_exec_prune)
 	{
-		PartitionPruneContext *context = &pprune->context;
+		PartitionPruneContext *context = &prelprune->context;
 
 		/* Set whether we can evaluate PARAM_EXEC Params or not */
 		context->evalexecparams = !initial_prune;
 
 		partset = get_matching_partitions(context,
-										  pprune->pruning_steps);
+										  prelprune->pruning_steps);
 	}
 	else
 	{
@@ -1764,23 +1826,23 @@ find_matching_subplans_recurse(PartitionPruneState *prunestate,
 		 * If no pruning is to be done, just include all partitions at this
 		 * level.
 		 */
-		partset = pprune->present_parts;
+		partset = prelprune->present_parts;
 	}
 
 	/* Translate partset into subplan indexes */
 	i = -1;
 	while ((i = bms_next_member(partset, i)) >= 0)
 	{
-		if (pprune->subplan_map[i] >= 0)
+		if (prelprune->subplan_map[i] >= 0)
 			*validsubplans = bms_add_member(*validsubplans,
-											pprune->subplan_map[i]);
+											prelprune->subplan_map[i]);
 		else
 		{
-			int			partidx = pprune->subpart_map[i];
+			int			partidx = prelprune->subpart_map[i];
 
 			if (partidx >= 0)
-				find_matching_subplans_recurse(prunestate,
-											   &prunestate->partprunedata[partidx],
+				find_matching_subplans_recurse(pprune,
+											   &pprune->partrelprunedata[partidx],
 											   initial_prune, validsubplans);
 			else
 			{
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5ce4fb43e1..1fc55e90d7 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -129,7 +129,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->as_whichplan = INVALID_SUBPLAN_INDEX;
 
 	/* If run-time partition pruning is enabled, then set that up now */
-	if (node->part_prune_infos != NIL)
+	if (node->part_prune_info != NULL)
 	{
 		PartitionPruneState *prunestate;
 
@@ -138,7 +138,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 
 		/* Create the working data structure for pruning. */
 		prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
-												   node->part_prune_infos);
+												   node->part_prune_info);
 		appendstate->as_prune_state = prunestate;
 
 		/* Perform an initial partition prune, if required. */
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 1c12075b01..39618323fc 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -245,7 +245,7 @@ _copyAppend(const Append *from)
 	COPY_NODE_FIELD(appendplans);
 	COPY_SCALAR_FIELD(first_partial_plan);
 	COPY_NODE_FIELD(partitioned_rels);
-	COPY_NODE_FIELD(part_prune_infos);
+	COPY_NODE_FIELD(part_prune_info);
 
 	return newnode;
 }
@@ -1181,6 +1181,17 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
 {
 	PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
 
+	COPY_NODE_FIELD(prune_infos);
+	COPY_BITMAPSET_FIELD(other_subplans);
+
+	return newnode;
+}
+
+static PartitionedRelPruneInfo *
+_copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
+{
+	PartitionedRelPruneInfo *newnode = makeNode(PartitionedRelPruneInfo);
+
 	COPY_SCALAR_FIELD(reloid);
 	COPY_NODE_FIELD(pruning_steps);
 	COPY_BITMAPSET_FIELD(present_parts);
@@ -4907,6 +4918,9 @@ copyObjectImpl(const void *from)
 		case T_PartitionPruneInfo:
 			retval = _copyPartitionPruneInfo(from);
 			break;
+		case T_PartitionedRelPruneInfo:
+			retval = _copyPartitionedRelPruneInfo(from);
+			break;
 		case T_PartitionPruneStepOp:
 			retval = _copyPartitionPruneStepOp(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index a88c0aecd0..1d78b53754 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -402,7 +402,7 @@ _outAppend(StringInfo str, const Append *node)
 	WRITE_NODE_FIELD(appendplans);
 	WRITE_INT_FIELD(first_partial_plan);
 	WRITE_NODE_FIELD(partitioned_rels);
-	WRITE_NODE_FIELD(part_prune_infos);
+	WRITE_NODE_FIELD(part_prune_info);
 }
 
 static void
@@ -1012,10 +1012,19 @@ _outPlanRowMark(StringInfo str, const PlanRowMark *node)
 
 static void
 _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
+{
+	WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
+
+	WRITE_NODE_FIELD(prune_infos);
+	WRITE_BITMAPSET_FIELD(other_subplans);
+}
+
+static void
+_outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
 {
 	int			i;
 
-	WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
+	WRITE_NODE_TYPE("PARTITIONEDRELPRUNEINFO");
 
 	WRITE_OID_FIELD(reloid);
 	WRITE_NODE_FIELD(pruning_steps);
@@ -3829,6 +3838,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionPruneInfo:
 				_outPartitionPruneInfo(str, obj);
 				break;
+			case T_PartitionedRelPruneInfo:
+				_outPartitionedRelPruneInfo(str, obj);
+				break;
 			case T_PartitionPruneStepOp:
 				_outPartitionPruneStepOp(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42aff7f57a..ea4e8df62e 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1612,7 +1612,7 @@ _readAppend(void)
 	READ_NODE_FIELD(appendplans);
 	READ_INT_FIELD(first_partial_plan);
 	READ_NODE_FIELD(partitioned_rels);
-	READ_NODE_FIELD(part_prune_infos);
+	READ_NODE_FIELD(part_prune_info);
 
 	READ_DONE();
 }
@@ -2328,6 +2328,17 @@ _readPartitionPruneInfo(void)
 {
 	READ_LOCALS(PartitionPruneInfo);
 
+	READ_NODE_FIELD(prune_infos);
+	READ_BITMAPSET_FIELD(other_subplans);
+
+	READ_DONE();
+}
+
+static PartitionedRelPruneInfo *
+_readPartitionedRelPruneInfo(void)
+{
+	READ_LOCALS(PartitionedRelPruneInfo);
+
 	READ_OID_FIELD(reloid);
 	READ_NODE_FIELD(pruning_steps);
 	READ_BITMAPSET_FIELD(present_parts);
@@ -2725,6 +2736,8 @@ parseNodeString(void)
 		return_value = _readPlanRowMark();
 	else if (MATCH("PARTITIONPRUNEINFO", 18))
 		return_value = _readPartitionPruneInfo();
+	else if (MATCH("PARTITIONEDRELPRUNEINFO", 23))
+		return_value = _readPartitionedRelPruneInfo();
 	else if (MATCH("PARTITIONPRUNESTEPOP", 20))
 		return_value = _readPartitionPruneStepOp();
 	else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3ada379f8b..f64b55d45a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1388,7 +1388,6 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
 	List	   *partitioned_rels = NIL;
-	bool		build_partitioned_rels = false;
 	double		partial_rows = -1;
 
 	/* If appropriate, consider parallel append */
@@ -1413,10 +1412,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	if (rel->part_scheme != NULL)
 	{
 		if (IS_SIMPLE_REL(rel))
-			partitioned_rels = rel->partitioned_child_rels;
+			partitioned_rels = list_make1(rel->partitioned_child_rels);
 		else if (IS_JOIN_REL(rel))
 		{
 			int			relid = -1;
+			List	   *partrels = NIL;
 
 			/*
 			 * For a partitioned joinrel, concatenate the component rels'
@@ -1430,16 +1430,16 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 				component = root->simple_rel_array[relid];
 				Assert(component->part_scheme != NULL);
 				Assert(list_length(component->partitioned_child_rels) >= 1);
-				partitioned_rels =
-					list_concat(partitioned_rels,
+				partrels =
+					list_concat(partrels,
 								list_copy(component->partitioned_child_rels));
 			}
+
+			partitioned_rels = list_make1(partrels);
 		}
 
 		Assert(list_length(partitioned_rels) >= 1);
 	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-		build_partitioned_rels = true;
 
 	/*
 	 * For every non-dummy child, remember the cheapest path.  Also, identify
@@ -1453,17 +1453,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		Path	   *cheapest_partial_path = NULL;
 
 		/*
-		 * If we need to build partitioned_rels, accumulate the partitioned
-		 * rels for this child.  We must ensure that parents are always listed
-		 * before their child partitioned tables.
+		 * For UNION ALLs with non-empty partitioned_child_rels, accumulate
+		 * the Lists of child relations.
 		 */
-		if (build_partitioned_rels)
-		{
-			List	   *cprels = childrel->partitioned_child_rels;
-
-			partitioned_rels = list_concat(partitioned_rels,
-										   list_copy(cprels));
-		}
+		if (rel->rtekind == RTE_SUBQUERY && childrel->partitioned_child_rels != NIL)
+			partitioned_rels = lappend(partitioned_rels,
+									   childrel->partitioned_child_rels);
 
 		/*
 		 * If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 165a9e9b8e..ec660f5627 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -124,6 +124,7 @@ static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
 static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 					  List **qual, List **indexqual, List **indexECs);
 static void bitmap_subplan_mark_shared(Plan *plan);
+static List *flatten_partitioned_rels(List *partitioned_rels);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 					List *tlist, List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
@@ -202,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
 static Append *make_append(List *appendplans, int first_partial_plan,
-			List *tlist, List *partitioned_rels, List *partpruneinfos);
+			List *tlist, List *partitioned_rels,
+			PartitionPruneInfo *partpruneinfo);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1030,7 +1032,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	List	   *subplans = NIL;
 	ListCell   *subpaths;
 	RelOptInfo *rel = best_path->path.parent;
-	List	   *partpruneinfos = NIL;
+	PartitionPruneInfo *partpruneinfo = NULL;
 
 	/*
 	 * The subpaths list could be empty, if every child was proven empty by
@@ -1090,13 +1092,12 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 
 		/*
 		 * If any quals exist, they may be useful to perform further partition
-		 * pruning during execution.  Generate a PartitionPruneInfo for each
-		 * partitioned rel to store these quals and allow translation of
-		 * partition indexes into subpath indexes.
+		 * pruning during execution.  Attempt to generate a PartitionPruneInfo
+		 * object to allow further pruning to be done during execution.
 		 */
 		if (prunequal != NIL)
-			partpruneinfos =
-				make_partition_pruneinfo(root,
+			partpruneinfo =
+				make_partition_pruneinfo(root, rel,
 										 best_path->partitioned_rels,
 										 best_path->subpaths, prunequal);
 	}
@@ -1110,7 +1111,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 
 	plan = make_append(subplans, best_path->first_partial_path,
 					   tlist, best_path->partitioned_rels,
-					   partpruneinfos);
+					   partpruneinfo);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -1218,7 +1219,8 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
 		subplans = lappend(subplans, subplan);
 	}
 
-	node->partitioned_rels = best_path->partitioned_rels;
+	node->partitioned_rels =
+		flatten_partitioned_rels(best_path->partitioned_rels);
 	node->mergeplans = subplans;
 
 	return (Plan *) node;
@@ -4968,6 +4970,29 @@ bitmap_subplan_mark_shared(Plan *plan)
 		elog(ERROR, "unrecognized node type: %d", nodeTag(plan));
 }
 
+/*
+ * flatten_partitioned_rels
+ *		Convert List of Lists into a single List with all elements from the
+*		sub-lists.
+ */
+static List *
+flatten_partitioned_rels(List *partitioned_rels)
+{
+	ListCell   *lc;
+	List	   *newlist = NIL;
+
+	foreach(lc, partitioned_rels)
+	{
+		List	   *sublist = lfirst(lc);
+
+		Assert(sublist->type == T_IntList);
+
+		newlist = list_concat(newlist, list_copy(sublist));
+	}
+
+	return newlist;
+}
+
 /*****************************************************************************
  *
  *	PLAN NODE BUILDING ROUTINES
@@ -5311,7 +5336,7 @@ make_foreignscan(List *qptlist,
 static Append *
 make_append(List *appendplans, int first_partial_plan,
 			List *tlist, List *partitioned_rels,
-			List *partpruneinfos)
+			PartitionPruneInfo *partpruneinfo)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5322,8 +5347,8 @@ make_append(List *appendplans, int first_partial_plan,
 	plan->righttree = NULL;
 	node->appendplans = appendplans;
 	node->first_partial_plan = first_partial_plan;
-	node->partitioned_rels = partitioned_rels;
-	node->part_prune_infos = partpruneinfos;
+	node->partitioned_rels = flatten_partitioned_rels(partitioned_rels);
+	node->part_prune_info = partpruneinfo;
 	return node;
 }
 
@@ -6480,7 +6505,7 @@ make_modifytable(PlannerInfo *root,
 	node->operation = operation;
 	node->canSetTag = canSetTag;
 	node->nominalRelation = nominalRelation;
-	node->partitioned_rels = partitioned_rels;
+	node->partitioned_rels = flatten_partitioned_rels(partitioned_rels);
 	node->partColsUpdated = partColsUpdated;
 	node->resultRelations = resultRelations;
 	node->resultRelIndex = -1;	/* will be set correctly in setrefs.c */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 602418f287..ea68b6ed66 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1616,6 +1616,7 @@ inheritance_planner(PlannerInfo *root)
 		 * contain at least one member, that is, the root parent's index.
 		 */
 		Assert(list_length(partitioned_rels) >= 1);
+		partitioned_rels = list_make1(partitioned_rels);
 	}
 
 	/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 354eb0d4e6..9ce216c28b 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -111,7 +111,11 @@ typedef struct PruneStepResult
 	bool		scan_null;		/* Scan the partition for NULL values? */
 } PruneStepResult;
 
-
+static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
+							  RelOptInfo *parentrel,
+							  int *relid_subplan_map,
+							  List *partitioned_rels, List *prunequal,
+							  Bitmapset **matchedsubplans);
 static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
 					bool *contradictory);
 static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context,
@@ -160,8 +164,8 @@ static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context
 						  FmgrInfo *partsupfunc, Bitmapset *nullkeys);
 static Bitmapset *pull_exec_paramids(Expr *expr);
 static bool pull_exec_paramids_walker(Node *node, Bitmapset **context);
-static bool analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps,
-					  int partnatts);
+static bool analyze_partkey_exprs(PartitionedRelPruneInfo *prelinfo,
+					  List *steps, int partnatts);
 static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
 						  PartitionPruneStepOp *opstep);
 static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
@@ -176,38 +180,37 @@ static bool partkey_datum_from_expr(PartitionPruneContext *context,
 
 /*
  * make_partition_pruneinfo
- *		Build List of PartitionPruneInfos, one for each partitioned rel.
- *		These can be used in the executor to allow additional partition
- *		pruning to take place.
- *
- * Here we generate partition pruning steps for 'prunequal' and also build a
- * data structure which allows mapping of partition indexes into 'subpaths'
- * indexes.
+ *		Builds a PartitionPruneInfo which can be used in the executor to allow
+ *		additional partition pruning to take place.  Returns NULL when
+ *		partition pruning would be useless.
  *
- * If no non-Const expressions are being compared to the partition key in any
- * of the 'partitioned_rels', then we return NIL to indicate no run-time
- * pruning should be performed.  Run-time pruning would be useless, since the
- * pruning done during planning will have pruned everything that can be.
+ * 'partitioned_rels' is a List containing Lists of partitioned relids.  Here
+ * we attempt to populate the PartitionPruneInfo adding a 'prune_infos' item
+ * for each item in the 'partitioned_rels' list.  However, some of the sets of
+ * partitioned relations may not require any run-time pruning.  In these cases
+ * we'll simply not include add a 'prune_infos' for that set and instead we'll
+ * add all the subplans which belong to that set into the PartitionPruneInfo's
+ * 'other_subplans' field.  Callers will likely never want to prune subplans
+ * which are mentioned in this field.
  */
-List *
-make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
-						 List *subpaths, List *prunequal)
+PartitionPruneInfo *
+make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
+						 List *partitioned_rels, List *subpaths,
+						 List *prunequal)
 {
-	RelOptInfo *targetpart = NULL;
-	List	   *pinfolist = NIL;
-	bool		doruntimeprune = false;
+	PartitionPruneInfo *pruneinfo;
+	Bitmapset  *allmatchedsubplans = NULL;
 	int		   *relid_subplan_map;
-	int		   *relid_subpart_map;
 	ListCell   *lc;
+	List	   *prunerelinfos;
 	int			i;
 
 	/*
-	 * Construct two temporary arrays to map from planner relids to subplan
-	 * and sub-partition indexes.  For convenience, we use 1-based indexes
-	 * here, so that zero can represent an un-filled array entry.
+	 * Construct a temporary array to map from planner relids to subplan
+	 * indexes.  For convenience, we use 1-based indexes here, so that zero
+	 * can represent an un-filled array entry.
 	 */
 	relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
-	relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size);
 
 	/*
 	 * relid_subplan_map maps relid of a leaf partition to the index in
@@ -227,10 +230,110 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 		relid_subplan_map[pathrel->relid] = i++;
 	}
 
+	Assert(partitioned_rels->type == T_List);
+
+	prunerelinfos = NIL;
+
+	/* We now build a PartitionedRelPruneInfo for each partitioned rel. */
+	foreach(lc, partitioned_rels)
+	{
+		List	   *rels = lfirst(lc);
+		List	   *prelinfolist;
+		Bitmapset  *matchedsubplans = NULL;
+
+		prelinfolist = make_partitionedrel_pruneinfo(root, parentrel,
+													 relid_subplan_map,
+													 rels, prunequal,
+													 &matchedsubplans);
+
+		/* When pruning is possible, record the matched subplans */
+		if (prelinfolist != NIL)
+		{
+			prunerelinfos = lappend(prunerelinfos, prelinfolist);
+			allmatchedsubplans = bms_join(matchedsubplans,
+										  allmatchedsubplans);
+		}
+	}
+
+	pfree(relid_subplan_map);
+
+	/*
+	 * if none of the partition hierarchies had any useful run-time pruning
+	 * quals then we can safely not bother with run-time pruning.
+	 */
+	if (prunerelinfos == NIL)
+		return NULL;
+
+	pruneinfo = makeNode(PartitionPruneInfo);
+	pruneinfo->prune_infos = prunerelinfos;
+
+	/*
+	 * Some subplans may not belong to of the listed partitioned_rels.  This
+	 * can happen for UNION ALL queries which include a non-partitioned table.
+	 * We record all of the subplans which we didn't build any
+	 * PartitionedRelPruneInfo for so that callers can easily identify which
+	 * subplans should not be pruned.
+	 */
+	if (bms_num_members(allmatchedsubplans) < list_length(subpaths))
+	{
+		Bitmapset  *other_subplans;
+
+		/* Create an inverted set of allmatchedsubplans */
+		other_subplans = bms_add_range(NULL, 0, list_length(subpaths) - 1);
+		other_subplans = bms_del_members(other_subplans, allmatchedsubplans);
+
+		pruneinfo->other_subplans = other_subplans;
+	}
+	else
+		pruneinfo->other_subplans = NULL;
+
+	return pruneinfo;
+}
+
+/*
+ * make_partitionedrel_pruneinfo
+ *		Build a List of PartitionedRelPruneInfos, one for each partitioned
+ *		rel.  These can be used in the executor to allow additional partition
+ *		pruning to take place.
+ *
+ * Here we generate partition pruning steps for 'prunequal' and also build a
+ * data structure which allows mapping of partition indexes into 'subpaths'
+ * indexes.
+ *
+ * If no non-Const expressions are being compared to the partition key in any
+ * of the 'partitioned_rels', then we return NIL to indicate no run-time
+ * pruning should be performed.  Run-time pruning would be useless since the
+ * pruning done during planning will have pruned everything that can be.
+ *
+ * On non-NIL return, 'matchedsubplans' is set to the subplan indexes which
+ * were matched to this partition hierarchy.
+ */
+static List *
+make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
+							  int *relid_subplan_map,
+							  List *partitioned_rels, List *prunequal,
+							  Bitmapset **matchedsubplans)
+{
+	RelOptInfo *targetpart = NULL;
+	List	   *prelinfolist = NIL;
+	bool		doruntimeprune = false;
+	bool		hascontradictingquals = false;
+	ListCell   *lc;
+	int		   *relid_subpart_map;
+	Bitmapset  *subplansfound = NULL;
+	int			i;
+
+	/*
+	 * Construct a temporary array to map from planner relids to index of the
+	 * partitioned_rel.  For convenience, we use 1-based indexes here, so that
+	 * zero can represent an un-filled array entry.
+	 */
+	relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size);
+
 	/*
 	 * relid_subpart_map maps relid of a non-leaf partition to the index in
 	 * 'partitioned_rels' of that rel (which will also be the index in the
-	 * returned PartitionPruneInfo list of the info for that partition).
+	 * returned PartitionedRelPruneInfo list of the info for that partition).
 	 */
 	i = 1;
 	foreach(lc, partitioned_rels)
@@ -246,12 +349,12 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 		relid_subpart_map[rti] = i++;
 	}
 
-	/* We now build a PartitionPruneInfo for each partitioned rel */
+	/* We now build a PartitionedRelPruneInfo for each partitioned rel */
 	foreach(lc, partitioned_rels)
 	{
 		Index		rti = lfirst_int(lc);
 		RelOptInfo *subpart = find_base_rel(root, rti);
-		PartitionPruneInfo *pinfo;
+		PartitionedRelPruneInfo *prelinfo;
 		RangeTblEntry *rte;
 		Bitmapset  *present_parts;
 		int			nparts = subpart->nparts;
@@ -269,6 +372,31 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 		if (!targetpart)
 		{
 			targetpart = subpart;
+
+			/*
+			 * When the first listed partitioned table is not the same rel as
+			 * 'parentrel', then we must be dealing with a UNION ALL
+			 * parentrel.  We'd better translate the pruning qual so that it's
+			 * compatible with the top-level partitioned table.  We overwrite
+			 * the input parameter here so that subsequent translations for
+			 * sub-partitioned tables translate from the top-level partitioned
+			 * table, rather than the UNION ALL parent.
+			 */
+			if (parentrel != subpart)
+			{
+				int			nappinfos;
+				AppendRelInfo **appinfos = find_appinfos_by_relids(root,
+																   subpart->relids,
+																   &nappinfos);
+
+				prunequal = (List *) adjust_appendrel_attrs(root, (Node *)
+															prunequal,
+															nappinfos,
+															appinfos);
+
+				pfree(appinfos);
+			}
+
 			partprunequal = prunequal;
 		}
 		else
@@ -320,35 +448,44 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 
 			subplan_map[i] = subplanidx;
 			subpart_map[i] = subpartidx;
-			if (subplanidx >= 0 || subpartidx >= 0)
+			if (subplanidx >= 0)
+			{
+				present_parts = bms_add_member(present_parts, i);
+
+				/* Record finding this subplan  */
+				subplansfound = bms_add_member(subplansfound, subplanidx);
+			}
+			else if (subpartidx >= 0)
 				present_parts = bms_add_member(present_parts, i);
 		}
 
+
 		rte = root->simple_rte_array[subpart->relid];
 
-		pinfo = makeNode(PartitionPruneInfo);
-		pinfo->reloid = rte->relid;
-		pinfo->pruning_steps = pruning_steps;
-		pinfo->present_parts = present_parts;
-		pinfo->nparts = nparts;
-		pinfo->subplan_map = subplan_map;
-		pinfo->subpart_map = subpart_map;
+		prelinfo = makeNode(PartitionedRelPruneInfo);
+		prelinfo->reloid = rte->relid;
+		prelinfo->pruning_steps = pruning_steps;
+		prelinfo->present_parts = present_parts;
+		prelinfo->nparts = nparts;
+		prelinfo->subplan_map = subplan_map;
+		prelinfo->subpart_map = subpart_map;
 
 		/* Determine which pruning types should be enabled at this level */
-		doruntimeprune |= analyze_partkey_exprs(pinfo, pruning_steps,
+		doruntimeprune |= analyze_partkey_exprs(prelinfo, pruning_steps,
 												partnatts);
 
-		pinfolist = lappend(pinfolist, pinfo);
+		prelinfolist = lappend(prelinfolist, prelinfo);
 	}
 
-	pfree(relid_subplan_map);
 	pfree(relid_subpart_map);
 
-	if (doruntimeprune)
-		return pinfolist;
+	if (!doruntimeprune)
+		return NIL;
+
+	*matchedsubplans = subplansfound;
 
 	/* No run-time pruning required. */
-	return NIL;
+	return prelinfolist;
 }
 
 /*
@@ -2758,10 +2895,11 @@ pull_exec_paramids_walker(Node *node, Bitmapset **context)
  *		executor startup-time or executor run-time pruning.
  *
  * Returns true if any executor partition pruning should be attempted at this
- * level.  Also fills fields of *pinfo to record how to process each step.
+ * level.  Also fills fields of *prelinfo to record how to process each step.
  */
 static bool
-analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps, int partnatts)
+analyze_partkey_exprs(PartitionedRelPruneInfo *prelinfo, List *steps,
+					  int partnatts)
 {
 	bool		doruntimeprune = false;
 	ListCell   *lc;
@@ -2771,11 +2909,12 @@ analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps, int partnatts)
 	 * Otherwise, if their expressions aren't simple Consts, they require
 	 * startup-time pruning.
 	 */
-	pinfo->nexprs = list_length(steps) * partnatts;
-	pinfo->hasexecparam = (bool *) palloc0(sizeof(bool) * pinfo->nexprs);
-	pinfo->do_initial_prune = false;
-	pinfo->do_exec_prune = false;
-	pinfo->execparamids = NULL;
+	prelinfo->nexprs = list_length(steps) * partnatts;
+	prelinfo->hasexecparam = (bool *) palloc0(sizeof(bool) *
+											  prelinfo->nexprs);
+	prelinfo->do_initial_prune = false;
+	prelinfo->do_exec_prune = false;
+	prelinfo->execparamids = NULL;
 
 	foreach(lc, steps)
 	{
@@ -2799,16 +2938,16 @@ analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps, int partnatts)
 														step->step.step_id,
 														keyno);
 
-				Assert(stateidx < pinfo->nexprs);
+				Assert(stateidx < prelinfo->nexprs);
 				hasexecparams = !bms_is_empty(execparamids);
-				pinfo->hasexecparam[stateidx] = hasexecparams;
-				pinfo->execparamids = bms_join(pinfo->execparamids,
-											   execparamids);
+				prelinfo->hasexecparam[stateidx] = hasexecparams;
+				prelinfo->execparamids = bms_join(prelinfo->execparamids,
+												  execparamids);
 
 				if (hasexecparams)
-					pinfo->do_exec_prune = true;
+					prelinfo->do_exec_prune = true;
 				else
-					pinfo->do_initial_prune = true;
+					prelinfo->do_initial_prune = true;
 
 				doruntimeprune = true;
 			}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 862bf65060..4327fd4cb1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -113,13 +113,13 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 /*-----------------------
- * PartitionPruningData - Per-partitioned-table data for run-time pruning
+ * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
  * of partitions.  For a multilevel partitioned table, we have one of these
  * for the topmost partition plus one for each non-leaf child partition,
  * ordered such that parents appear before their children.
  *
  * subplan_map[] and subpart_map[] have the same definitions as in
- * PartitionPruneInfo (see plannodes.h); though note that here,
+ * PartitionedRelPruneInfo (see plannodes.h); though note that here,
  * subpart_map contains indexes into PartitionPruneState.partprunedata[].
  *
  * subplan_map					Subplan index by partition index, or -1.
@@ -136,7 +136,7 @@ typedef struct PartitionTupleRouting
  *								executor run (for this partitioning level).
  *-----------------------
  */
-typedef struct PartitionPruningData
+typedef struct PartitionedRelPruningData
 {
 	int		   *subplan_map;
 	int		   *subpart_map;
@@ -145,6 +145,17 @@ typedef struct PartitionPruningData
 	List	   *pruning_steps;
 	bool		do_initial_prune;
 	bool		do_exec_prune;
+} PartitionedRelPruningData;
+
+/*
+ * PartitionPruningData - Encapsulates an array of PartitionedRelPruningData
+ * which belong to a single partition hierarchy containing 1 or more
+ * partitions.
+ */
+typedef struct PartitionPruningData
+{
+	int			num_partrelprunedata;
+	PartitionedRelPruningData partrelprunedata[FLEXIBLE_ARRAY_MEMBER];
 } PartitionPruningData;
 
 /*-----------------------
@@ -158,9 +169,6 @@ typedef struct PartitionPruningData
  * table per parent plan node, hence partprunedata[] need describe only one
  * partitioning hierarchy.
  *
- * partprunedata		Array of PartitionPruningData for the plan's
- *						partitioned relation, ordered such that parent tables
- *						appear before children (hence, topmost table is first).
  * num_partprunedata	Number of items in 'partprunedata' array.
  * do_initial_prune		true if pruning should be performed during executor
  *						startup (at any hierarchy level).
@@ -170,18 +178,27 @@ typedef struct PartitionPruningData
  *						any of the partprunedata structs.  Pruning must be
  *						done again each time the value of one of these
  *						parameters changes.
+ * other_subplans		Contains subplan indexes which don't belong to any
+ *						'partprunedata', e.g UNION ALL children that are not
+ *						partitioned tables or a partitioned table that the
+ *						planner deemed run-time pruning to be useless for.
+ *						These must not be pruned.
  * prune_context		A short-lived memory context in which to execute the
  *						partition pruning functions.
+ * partprunedata		Array of PartitionPruningData pointers for the plan's
+ *						partitioned relation, ordered such that parent tables
+ *						appear before children (hence, topmost table is first).
  *-----------------------
  */
 typedef struct PartitionPruneState
 {
-	PartitionPruningData *partprunedata;
 	int			num_partprunedata;
 	bool		do_initial_prune;
 	bool		do_exec_prune;
 	Bitmapset  *execparamids;
+	Bitmapset  *other_subplans;
 	MemoryContext prune_context;
+	PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
 } PartitionPruneState;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
@@ -209,7 +226,7 @@ extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 						PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
-							  List *partitionpruneinfo);
+							  PartitionPruneInfo *partitionpruneinfo);
 extern void ExecDestroyPartitionPruneState(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 43f1552241..697d3d7a5f 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -88,6 +88,7 @@ typedef enum NodeTag
 	T_NestLoopParam,
 	T_PlanRowMark,
 	T_PartitionPruneInfo,
+	T_PartitionedRelPruneInfo,
 	T_PartitionPruneStepOp,
 	T_PartitionPruneStepCombine,
 	T_PlanInvalItem,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 5201c6d4bc..b341aa7f35 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -241,6 +241,8 @@ typedef struct ModifyTable
 	List	   *exclRelTlist;	/* tlist of the EXCLUDED pseudo relation */
 } ModifyTable;
 
+struct PartitionPruneInfo;
+
 /* ----------------
  *	 Append node -
  *		Generate the concatenation of the results of sub-plans.
@@ -260,8 +262,8 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 
-	/* Info for run-time subplan pruning, one entry per partitioned_rels */
-	List	   *part_prune_infos;	/* List of PartitionPruneInfo */
+	/* Info for run-time subplan pruning */
+	struct PartitionPruneInfo *part_prune_info;
 } Append;
 
 /* ----------------
@@ -1051,18 +1053,34 @@ typedef struct PlanRowMark
  */
 
 /*
- * PartitionPruneInfo - Details required to allow the executor to prune
+ * PartitionPruneInfo-  - Details required to allow the executor to prune
  * partitions.
  *
+ * prune_infos			List of Lists containing PartitionedRelPruneInfo
+ * other_subplans		Indexes of any subplans which are not accounted for
+ *						by any of the PartitionedRelPruneInfo stored in
+ *						'prune_infos'.
+ */
+typedef struct PartitionPruneInfo
+{
+	NodeTag		type;
+	List	   *prune_infos;
+	Bitmapset  *other_subplans;
+} PartitionPruneInfo;
+
+/*
+ * PartitionedRelPruneInfo - Details required to allow the executor to prune
+ * partitions for a single partitioned table.
+ *
  * Here we store mapping details to allow translation of a partitioned table's
  * index as returned by the partition pruning code into subplan indexes for
  * plan types which support arbitrary numbers of subplans, such as Append.
  * We also store various details to tell the executor when it should be
  * performing partition pruning.
  *
- * Each PartitionPruneInfo describes the partitioning rules for a single
+ * Each PartitionedRelPruneInfo describes the partitioning rules for a single
  * partitioned table (a/k/a level of partitioning).  For a multilevel
- * partitioned table, we have a List of PartitionPruneInfos, where the
+ * partitioned table, we have a List of PartitionedRelPruneInfo, where the
  * first entry represents the topmost partitioned table and additional
  * entries represent non-leaf child partitions, ordered such that parents
  * appear before their children.
@@ -1073,11 +1091,12 @@ typedef struct PlanRowMark
  * zero-based index of the partition's subplan in the parent plan's subplan
  * list; it is -1 if the partition is non-leaf or has been pruned.  For a
  * non-leaf partition p, subpart_map[p] contains the zero-based index of
- * that sub-partition's PartitionPruneInfo in the plan's PartitionPruneInfo
- * list; it is -1 if the partition is a leaf or has been pruned.  All these
- * indexes are global across the whole partitioned table and Append plan node.
+ * that sub-partition's PartitionedRelPruneInfo in the plan's
+ * PartitionedRelPruneInfo list; it is -1 if the partition is a leaf or has
+ * been pruned.  All these indexes are global across the whole partitioned
+ * table and the parenting plan node.
  */
-typedef struct PartitionPruneInfo
+typedef struct PartitionedRelPruneInfo
 {
 	NodeTag		type;
 	Oid			reloid;			/* OID of partition rel for this level */
@@ -1095,7 +1114,7 @@ typedef struct PartitionPruneInfo
 	bool		do_exec_prune;	/* true if pruning should be performed during
 								 * executor run. */
 	Bitmapset  *execparamids;	/* All PARAM_EXEC Param IDs in pruning_steps */
-} PartitionPruneInfo;
+} PartitionedRelPruneInfo;
 
 /*
  * Abstract Node type for partition pruning steps (there are no concrete
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 9944d2832f..df3bcb737d 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -74,7 +74,8 @@ typedef struct PartitionPruneContext
 #define PruneCxtStateIdx(partnatts, step_id, keyno) \
 	((partnatts) * (step_id) + (keyno))
 
-extern List *make_partition_pruneinfo(PlannerInfo *root,
+extern PartitionPruneInfo *make_partition_pruneinfo(PlannerInfo *root,
+						 RelOptInfo *parentrel,
 						 List *partitioned_rels,
 						 List *subpaths, List *prunequal);
 extern Relids prune_append_rel_partitions(RelOptInfo *rel);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 022b7c55c7..bff3c8c12d 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2367,6 +2367,96 @@ select * from ab where a = (select max(a) from lprt_a) and b = (select max(a)-1
                Index Cond: (a = $0)
 (52 rows)
 
+-- Test run-time partition pruning with UNION ALL parents
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from ab where a = 1 union all select * from ab) ab where b = (select 1);
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   InitPlan 1 (returns $0)
+     ->  Result (actual rows=1 loops=1)
+   ->  Append (actual rows=0 loops=1)
+         ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_b1_1 (actual rows=0 loops=1)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                     Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b2 ab_a1_b2_1 (never executed)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+                     Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b3 ab_a1_b3_1 (never executed)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+                     Index Cond: (a = 1)
+   ->  Seq Scan on ab_a1_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a1_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a1_b3 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b3 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b3 (never executed)
+         Filter: (b = $0)
+(37 rows)
+
+-- A case containing a UNION ALL with a non-partitioned child.
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from ab where a = 1 union all (values(10,5)) union all select * from ab) ab where b = (select 1);
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   InitPlan 1 (returns $0)
+     ->  Result (actual rows=1 loops=1)
+   ->  Append (actual rows=0 loops=1)
+         ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_b1_1 (actual rows=0 loops=1)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                     Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b2 ab_a1_b2_1 (never executed)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+                     Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b3 ab_a1_b3_1 (never executed)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+                     Index Cond: (a = 1)
+   ->  Result (actual rows=0 loops=1)
+         One-Time Filter: (5 = $0)
+   ->  Seq Scan on ab_a1_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a1_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a1_b3 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b3 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b3 (never executed)
+         Filter: (b = $0)
+(39 rows)
+
 deallocate ab_q1;
 deallocate ab_q2;
 deallocate ab_q3;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 2357f02cde..9b14ca755f 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -533,6 +533,14 @@ reset max_parallel_workers_per_gather;
 explain (analyze, costs off, summary off, timing off)
 select * from ab where a = (select max(a) from lprt_a) and b = (select max(a)-1 from lprt_a);
 
+-- Test run-time partition pruning with UNION ALL parents
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from ab where a = 1 union all select * from ab) ab where b = (select 1);
+
+-- A case containing a UNION ALL with a non-partitioned child.
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from ab where a = 1 union all (values(10,5)) union all select * from ab) ab where b = (select 1);
+
 deallocate ab_q1;
 deallocate ab_q2;
 deallocate ab_q3;
-- 
2.17.1

#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Rowley (#24)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2018-Jul-16, David Rowley wrote:

On 16 July 2018 at 12:55, David Rowley <david.rowley@2ndquadrant.com> wrote:

Thinking about this some more, I don't quite see any reason that the
partitioned_rels for a single hierarchy couldn't just be a Bitmapset
instead of an IntList.

Of course, this is not possible since we can't pass a List of
Bitmapsets to the executor due to Bitmapset not being a node type.

Maybe we can just add a new node type that wraps a lone bitmapset. The
naive implementation (just print out individual members) should be
pretty straightforward; a more sophisticated one (print out the "words")
may end up more compact or not depending on density, but much harder for
debugging, and probably means a catversion bump when BITS_PER_BITMAPWORD
is changed, so probably not a great idea anyway.

I suppose the only reason we haven't done this yet is nobody has needed
it. Sounds like its time has come.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#30David Rowley
david.rowley@2ndquadrant.com
In reply to: David Rowley (#28)
1 attachment(s)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 17 July 2018 at 12:21, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 16 July 2018 at 06:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I started to look at this patch. I think this is basically the right
direction to go in, but I'm not terribly happy with the details of the
data structure design.

I've made an attempt at addressing the issues that I understood.

I've attached a patch intended for master which is just v2 based on
post 5220bb7533.

No other changes were made.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

v3-0001-Fix-run-time-partition-pruning-for-UNION-ALL-pare.patchapplication/octet-stream; name=v3-0001-Fix-run-time-partition-pruning-for-UNION-ALL-pare.patchDownload
From 583c900e3e7c257f387c8e7e06dac2950bd81122 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 26 Jun 2018 23:49:31 +1200
Subject: [PATCH v3] Fix run-time partition pruning for UNION ALL parents

The run-time partition pruning code added in 499be013d was unaware that the
partition_rels list that's built during add_paths_to_append_rel() could be
non-empty for relations other than just partitioned relations.  It can also
be set for UNION ALL parents where one or more union children are
partitioned tables.  This can cause the partitioned_rels list to end up
with the partition relids from multiple partition hierarchies to become
mixed.

This commit resolved that issue by never mixing the relids from different
UNION ALL children.  Instead we maintain a List of Lists containing the
partitioned relids.  This commit also adds all the new required code in
both the planner and executor to allow run-time pruning to work for UNION
ALL parents which query multiple partitioned tables.
---
 src/backend/executor/execPartition.c          | 394 +++++++++++++++-----------
 src/backend/executor/nodeAppend.c             |   4 +-
 src/backend/executor/nodeMergeAppend.c        |   4 +-
 src/backend/nodes/copyfuncs.c                 |  18 +-
 src/backend/nodes/outfuncs.c                  |  18 +-
 src/backend/nodes/readfuncs.c                 |  17 +-
 src/backend/optimizer/path/allpaths.c         |  27 +-
 src/backend/optimizer/plan/createplan.c       |  52 +++-
 src/backend/optimizer/plan/planner.c          |   1 +
 src/backend/partitioning/partprune.c          | 249 ++++++++++++----
 src/include/executor/execPartition.h          |  33 ++-
 src/include/nodes/nodes.h                     |   1 +
 src/include/nodes/plannodes.h                 |  43 ++-
 src/include/partitioning/partprune.h          |   3 +-
 src/test/regress/expected/partition_prune.out |  90 ++++++
 src/test/regress/sql/partition_prune.sql      |   8 +
 16 files changed, 680 insertions(+), 282 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7a4665cc4e..dac789d414 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -48,8 +48,8 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 									 bool *isnull,
 									 int maxfieldlen);
 static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
-static void find_matching_subplans_recurse(PartitionPruneState *prunestate,
-							   PartitionPruningData *pprune,
+static void find_matching_subplans_recurse(PartitionPruningData *pprune,
+							   PartitionedRelPruningData *prelprune,
 							   bool initial_prune,
 							   Bitmapset **validsubplans);
 
@@ -1394,34 +1394,42 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  *
  * 'planstate' is the parent plan node's execution state.
  *
- * 'partitionpruneinfo' is a List of PartitionPruneInfos as generated by
+ * 'partitionpruneinfo' is a PartitionPruneInfo as generated by
  * make_partition_pruneinfo.  Here we build a PartitionPruneState containing a
- * PartitionPruningData for each item in that List.  This data can be re-used
- * each time we re-evaluate which partitions match the pruning steps provided
- * in each PartitionPruneInfo.
+ * PartitionPruningData for each partitionpruneinfo->prune_infos, in
+ * turn, a PartitionedRelPruningData is created for each
+ * PartitionedRelPruneInfo stored in each 'prune_infos'.  This two-level system
+ * is required in order to support run-time pruning with UNION ALL parents
+ * containing one or more partitioned tables as children.  The data stored in
+ * each PartitionedRelPruningData can be re-used each time we re-evaluate
+ * which partitions match the pruning steps provided in each
+ * PartitionedRelPruneInfo.
  */
 PartitionPruneState *
-ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
+ExecCreatePartitionPruneState(PlanState *planstate,
+							  PartitionPruneInfo *partitionpruneinfo)
 {
 	PartitionPruneState *prunestate;
-	PartitionPruningData *prunedata;
 	ListCell   *lc;
+	int			n_part_hierarchies;
 	int			i;
 
-	Assert(partitionpruneinfo != NIL);
+	Assert(partitionpruneinfo != NULL);
+
+	n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
 
 	/*
 	 * Allocate the data structure
 	 */
-	prunestate = (PartitionPruneState *) palloc(sizeof(PartitionPruneState));
-	prunedata = (PartitionPruningData *)
-		palloc(sizeof(PartitionPruningData) * list_length(partitionpruneinfo));
+	prunestate = (PartitionPruneState *)
+		palloc(offsetof(PartitionPruneState, partprunedata) +
+			   sizeof(PartitionPruningData *) * n_part_hierarchies);
 
-	prunestate->partprunedata = prunedata;
-	prunestate->num_partprunedata = list_length(partitionpruneinfo);
+	prunestate->num_partprunedata = n_part_hierarchies;
 	prunestate->do_initial_prune = false;	/* may be set below */
 	prunestate->do_exec_prune = false;	/* may be set below */
 	prunestate->execparamids = NULL;
+	prunestate->other_subplans = bms_copy(partitionpruneinfo->other_subplans);
 
 	/*
 	 * Create a short-term memory context which we'll use when making calls to
@@ -1435,113 +1443,129 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 							  ALLOCSET_DEFAULT_SIZES);
 
 	i = 0;
-	foreach(lc, partitionpruneinfo)
+	foreach(lc, partitionpruneinfo->prune_infos)
 	{
-		PartitionPruneInfo *pinfo = castNode(PartitionPruneInfo, lfirst(lc));
-		PartitionPruningData *pprune = &prunedata[i];
-		PartitionPruneContext *context = &pprune->context;
-		PartitionDesc partdesc;
-		PartitionKey partkey;
-		int			partnatts;
-		int			n_steps;
 		ListCell   *lc2;
-
-		/*
-		 * We must copy the subplan_map rather than pointing directly to the
-		 * plan's version, as we may end up making modifications to it later.
-		 */
-		pprune->subplan_map = palloc(sizeof(int) * pinfo->nparts);
-		memcpy(pprune->subplan_map, pinfo->subplan_map,
-			   sizeof(int) * pinfo->nparts);
-
-		/* We can use the subpart_map verbatim, since we never modify it */
-		pprune->subpart_map = pinfo->subpart_map;
-
-		/* present_parts is also subject to later modification */
-		pprune->present_parts = bms_copy(pinfo->present_parts);
-
-		/*
-		 * We need to hold a pin on the partitioned table's relcache entry so
-		 * that we can rely on its copies of the table's partition key and
-		 * partition descriptor.  We need not get a lock though; one should
-		 * have been acquired already by InitPlan or
-		 * ExecLockNonLeafAppendTables.
-		 */
-		context->partrel = relation_open(pinfo->reloid, NoLock);
-
-		partkey = RelationGetPartitionKey(context->partrel);
-		partdesc = RelationGetPartitionDesc(context->partrel);
-		n_steps = list_length(pinfo->pruning_steps);
-
-		context->strategy = partkey->strategy;
-		context->partnatts = partnatts = partkey->partnatts;
-		context->nparts = pinfo->nparts;
-		context->boundinfo = partdesc->boundinfo;
-		context->partcollation = partkey->partcollation;
-		context->partsupfunc = partkey->partsupfunc;
-
-		/* We'll look up type-specific support functions as needed */
-		context->stepcmpfuncs = (FmgrInfo *)
-			palloc0(sizeof(FmgrInfo) * n_steps * partnatts);
-
-		context->ppccontext = CurrentMemoryContext;
-		context->planstate = planstate;
-
-		/* Initialize expression state for each expression we need */
-		context->exprstates = (ExprState **)
-			palloc0(sizeof(ExprState *) * n_steps * partnatts);
-		foreach(lc2, pinfo->pruning_steps)
+		List	   *partrelpruneinfos = lfirst_node(List, lc);
+		PartitionPruningData *prunedata;
+		int			npartrelpruneinfos = list_length(partrelpruneinfos);
+		int			j;
+
+		prunedata = palloc(offsetof(PartitionPruningData, partrelprunedata) +
+						   npartrelpruneinfos * sizeof(PartitionedRelPruningData));
+		prunestate->partprunedata[i] = prunedata;
+		prunedata->num_partrelprunedata = npartrelpruneinfos;
+
+		j = 0;
+		foreach(lc2, partrelpruneinfos)
 		{
-			PartitionPruneStepOp *step = (PartitionPruneStepOp *) lfirst(lc2);
+			PartitionedRelPruneInfo *pinfo = castNode(PartitionedRelPruneInfo, lfirst(lc2));
+			PartitionedRelPruningData *prelprune = &prunedata->partrelprunedata[j];
+			PartitionPruneContext *context = &prelprune->context;
+			PartitionDesc partdesc;
+			PartitionKey partkey;
+			int			partnatts;
+			int			n_steps;
 			ListCell   *lc3;
-			int			keyno;
 
-			/* not needed for other step kinds */
-			if (!IsA(step, PartitionPruneStepOp))
-				continue;
+			/*
+			 * We must copy the subplan_map rather than pointing directly to
+			 * the plan's version, as we may end up making modifications to it
+			 * later.
+			 */
+			prelprune->subplan_map = palloc(sizeof(int) * pinfo->nparts);
+			memcpy(prelprune->subplan_map, pinfo->subplan_map,
+				   sizeof(int) * pinfo->nparts);
 
-			Assert(list_length(step->exprs) <= partnatts);
+			/* We can use the subpart_map verbatim, since we never modify it */
+			prelprune->subpart_map = pinfo->subpart_map;
 
-			keyno = 0;
-			foreach(lc3, step->exprs)
+			/* present_parts is also subject to later modification */
+			prelprune->present_parts = bms_copy(pinfo->present_parts);
+
+			/*
+			 * We need to hold a pin on the partitioned table's relcache entry
+			 * so that we can rely on its copies of the table's partition key
+			 * and partition descriptor.  We need not get a lock though; one
+			 * should have been acquired already by InitPlan or
+			 * ExecLockNonLeafAppendTables.
+			 */
+			context->partrel = relation_open(pinfo->reloid, NoLock);
+
+			partkey = RelationGetPartitionKey(context->partrel);
+			partdesc = RelationGetPartitionDesc(context->partrel);
+			n_steps = list_length(pinfo->pruning_steps);
+
+			context->strategy = partkey->strategy;
+			context->partnatts = partnatts = partkey->partnatts;
+			context->nparts = pinfo->nparts;
+			context->boundinfo = partdesc->boundinfo;
+			context->partcollation = partkey->partcollation;
+			context->partsupfunc = partkey->partsupfunc;
+
+			/* We'll look up type-specific support functions as needed */
+			context->stepcmpfuncs = (FmgrInfo *)
+				palloc0(sizeof(FmgrInfo) * n_steps * partnatts);
+
+			context->ppccontext = CurrentMemoryContext;
+			context->planstate = planstate;
+
+			/* Initialize expression state for each expression we need */
+			context->exprstates = (ExprState **)
+				palloc0(sizeof(ExprState *) * n_steps * partnatts);
+			foreach(lc3, pinfo->pruning_steps)
 			{
-				Expr	   *expr = (Expr *) lfirst(lc3);
+				PartitionPruneStepOp *step = (PartitionPruneStepOp *) lfirst(lc3);
+				ListCell   *lc4;
+				int			keyno;
 
-				/* not needed for Consts */
-				if (!IsA(expr, Const))
+				/* not needed for other step kinds */
+				if (!IsA(step, PartitionPruneStepOp))
+					continue;
+
+				Assert(list_length(step->exprs) <= partnatts);
+
+				keyno = 0;
+				foreach(lc4, step->exprs)
 				{
-					int			stateidx = PruneCxtStateIdx(partnatts,
-															step->step.step_id,
-															keyno);
+					Expr	   *expr = (Expr *) lfirst(lc4);
+
+					/* not needed for Consts */
+					if (!IsA(expr, Const))
+					{
+						int			stateidx = PruneCxtStateIdx(partnatts,
+																step->step.step_id,
+																keyno);
 
-					context->exprstates[stateidx] =
-						ExecInitExpr(expr, context->planstate);
+						context->exprstates[stateidx] =
+							ExecInitExpr(expr, context->planstate);
+					}
+					keyno++;
 				}
-				keyno++;
 			}
-		}
 
-		/* Array is not modified at runtime, so just point to plan's copy */
-		context->exprhasexecparam = pinfo->hasexecparam;
+			/* Array is not modified at runtime, so just point to plan's copy */
+			context->exprhasexecparam = pinfo->hasexecparam;
 
-		pprune->pruning_steps = pinfo->pruning_steps;
-		pprune->do_initial_prune = pinfo->do_initial_prune;
-		pprune->do_exec_prune = pinfo->do_exec_prune;
+			prelprune->pruning_steps = pinfo->pruning_steps;
+			prelprune->do_initial_prune = pinfo->do_initial_prune;
+			prelprune->do_exec_prune = pinfo->do_exec_prune;
 
-		/* Record if pruning would be useful at any level */
-		prunestate->do_initial_prune |= pinfo->do_initial_prune;
-		prunestate->do_exec_prune |= pinfo->do_exec_prune;
+			/* Record if pruning would be useful at any level */
+			prunestate->do_initial_prune |= pinfo->do_initial_prune;
+			prunestate->do_exec_prune |= pinfo->do_exec_prune;
 
-		/*
-		 * Accumulate the IDs of all PARAM_EXEC Params affecting the
-		 * partitioning decisions at this plan node.
-		 */
-		prunestate->execparamids = bms_add_members(prunestate->execparamids,
-												   pinfo->execparamids);
+			/*
+			 * Accumulate the IDs of all PARAM_EXEC Params affecting the
+			 * partitioning decisions at this plan node.
+			 */
+			prunestate->execparamids = bms_add_members(prunestate->execparamids,
+													   pinfo->execparamids);
 
+			j++;
+		}
 		i++;
 	}
-
 	return prunestate;
 }
 
@@ -1555,13 +1579,17 @@ ExecCreatePartitionPruneState(PlanState *planstate, List *partitionpruneinfo)
 void
 ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
 {
+	PartitionPruningData **partprunedata = prunestate->partprunedata;
 	int			i;
 
 	for (i = 0; i < prunestate->num_partprunedata; i++)
 	{
-		PartitionPruningData *pprune = &prunestate->partprunedata[i];
+		PartitionPruningData *pprune = partprunedata[i];
+		PartitionedRelPruningData *prunedata = pprune->partrelprunedata;
+		int			j;
 
-		relation_close(pprune->context.partrel, NoLock);
+		for (j = 0; j < pprune->num_partrelprunedata; j++)
+			relation_close(prunedata[j].context.partrel, NoLock);
 	}
 }
 
@@ -1581,31 +1609,42 @@ ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
 Bitmapset *
 ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 {
-	PartitionPruningData *pprune;
 	MemoryContext oldcontext;
 	Bitmapset  *result = NULL;
+	int			i;
 
 	Assert(prunestate->do_initial_prune);
 
-	pprune = prunestate->partprunedata;
-
 	/*
 	 * Switch to a temp context to avoid leaking memory in the executor's
 	 * memory context.
 	 */
 	oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
 
-	/* Perform pruning without using PARAM_EXEC Params */
-	find_matching_subplans_recurse(prunestate, pprune, true, &result);
+	for (i = 0; i < prunestate->num_partprunedata; i++)
+	{
+		PartitionPruningData *pprune;
+		PartitionedRelPruningData *prelprune;
+
+		pprune = prunestate->partprunedata[i];
+		prelprune = &pprune->partrelprunedata[0];
+
+		/* Perform pruning without using PARAM_EXEC Params */
+		find_matching_subplans_recurse(pprune, prelprune, true, &result);
+
+		/* Expression eval may have used space in node's ps_ExprContext too */
+		ResetExprContext(prelprune->context.planstate->ps_ExprContext);
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
 	/* Copy result out of the temp context before we reset it */
 	result = bms_copy(result);
 
+	/* Add in any subplans which partition pruning didn't account for */
+	result = bms_add_members(result, prunestate->other_subplans);
+
 	MemoryContextReset(prunestate->prune_context);
-	/* Expression eval may have used space in node's ps_ExprContext too */
-	ResetExprContext(pprune->context.planstate->ps_ExprContext);
 
 	/*
 	 * If any subplans were pruned, we must re-sequence the subplan indexes so
@@ -1633,59 +1672,70 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 		}
 
 		/*
-		 * Now we can update each PartitionPruneInfo's subplan_map with new
-		 * subplan indexes.  We must also recompute its present_parts bitmap.
-		 * We perform this loop in back-to-front order so that we determine
-		 * present_parts for the lowest-level partitioned tables first.  This
-		 * way we can tell whether a sub-partitioned table's partitions were
-		 * entirely pruned so we can exclude that from 'present_parts'.
+		 * Now we can update each PartitionedRelPruneInfo's subplan_map with
+		 * new subplan indexes.  We must also recompute its present_parts
+		 * bitmap. We perform this loop in back-to-front order so that we
+		 * determine present_parts for the lowest-level partitioned tables
+		 * first.  This way we can tell whether a sub-partitioned table's
+		 * partitions were entirely pruned so we can exclude that from
+		 * 'present_parts'.
 		 */
-		for (i = prunestate->num_partprunedata - 1; i >= 0; i--)
+
+		for (i = 0; i < prunestate->num_partprunedata; i++)
 		{
-			int			nparts;
 			int			j;
+			PartitionPruningData *prunedata;
 
-			pprune = &prunestate->partprunedata[i];
-			nparts = pprune->context.nparts;
-			/* We just rebuild present_parts from scratch */
-			bms_free(pprune->present_parts);
-			pprune->present_parts = NULL;
+			prunedata = prunestate->partprunedata[i];
 
-			for (j = 0; j < nparts; j++)
+			for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
 			{
-				int			oldidx = pprune->subplan_map[j];
-				int			subidx;
+				PartitionedRelPruningData *pprune;
+				int			nparts;
+				int			k;
 
-				/*
-				 * If this partition existed as a subplan then change the old
-				 * subplan index to the new subplan index.  The new index may
-				 * become -1 if the partition was pruned above, or it may just
-				 * come earlier in the subplan list due to some subplans being
-				 * removed earlier in the list.  If it's a subpartition, add
-				 * it to present_parts unless it's entirely pruned.
-				 */
-				if (oldidx >= 0)
-				{
-					Assert(oldidx < nsubplans);
-					pprune->subplan_map[j] = new_subplan_indexes[oldidx];
+				pprune = &prunedata->partrelprunedata[j];
+				nparts = pprune->context.nparts;
+				/* We just rebuild present_parts from scratch */
+				bms_free(pprune->present_parts);
+				pprune->present_parts = NULL;
 
-					if (new_subplan_indexes[oldidx] >= 0)
-						pprune->present_parts =
-							bms_add_member(pprune->present_parts, j);
-				}
-				else if ((subidx = pprune->subpart_map[j]) >= 0)
+				for (k = 0; k < nparts; k++)
 				{
-					PartitionPruningData *subprune;
+					int			oldidx = pprune->subplan_map[k];
+					int			subidx;
 
-					subprune = &prunestate->partprunedata[subidx];
+					/*
+					 * If this partition existed as a subplan then change the
+					 * old subplan index to the new subplan index.  The new
+					 * index may become -1 if the partition was pruned above,
+					 * or it may just come earlier in the subplan list due to
+					 * some subplans being removed earlier in the list.  If
+					 * it's a subpartition, add it to present_parts unless
+					 * it's entirely pruned.
+					 */
+					if (oldidx >= 0)
+					{
+						Assert(oldidx < nsubplans);
+						pprune->subplan_map[k] = new_subplan_indexes[oldidx];
 
-					if (!bms_is_empty(subprune->present_parts))
-						pprune->present_parts =
-							bms_add_member(pprune->present_parts, j);
+						if (new_subplan_indexes[oldidx] >= 0)
+							pprune->present_parts =
+								bms_add_member(pprune->present_parts, k);
+					}
+					else if ((subidx = pprune->subpart_map[k]) >= 0)
+					{
+						PartitionedRelPruningData *subprune;
+
+						subprune = &prunedata->partrelprunedata[subidx];
+
+						if (!bms_is_empty(subprune->present_parts))
+							pprune->present_parts =
+								bms_add_member(pprune->present_parts, k);
+					}
 				}
 			}
 		}
-
 		pfree(new_subplan_indexes);
 	}
 
@@ -1702,11 +1752,9 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 Bitmapset *
 ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
 {
-	PartitionPruningData *pprune;
 	MemoryContext oldcontext;
 	Bitmapset  *result = NULL;
-
-	pprune = prunestate->partprunedata;
+	int			i;
 
 	/*
 	 * Switch to a temp context to avoid leaking memory in the executor's
@@ -1714,16 +1762,29 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
 	 */
 	oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
 
-	find_matching_subplans_recurse(prunestate, pprune, false, &result);
+	for (i = 0; i < prunestate->num_partprunedata; i++)
+	{
+		PartitionPruningData *pprune;
+		PartitionedRelPruningData *prelprune;
+
+		pprune = prunestate->partprunedata[i];
+		prelprune = &pprune->partrelprunedata[0];
+
+		find_matching_subplans_recurse(pprune, prelprune, false, &result);
+
+		/* Expression eval may have used space in node's ps_ExprContext too */
+		ResetExprContext(prelprune->context.planstate->ps_ExprContext);
+	}
 
 	MemoryContextSwitchTo(oldcontext);
 
 	/* Copy result out of the temp context before we reset it */
 	result = bms_copy(result);
 
+	/* Add in any subplans which partition pruning didn't account for */
+	result = bms_add_members(result, prunestate->other_subplans);
+
 	MemoryContextReset(prunestate->prune_context);
-	/* Expression eval may have used space in node's ps_ExprContext too */
-	ResetExprContext(pprune->context.planstate->ps_ExprContext);
 
 	return result;
 }
@@ -1736,8 +1797,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
  * Adds valid (non-prunable) subplan IDs to *validsubplans
  */
 static void
-find_matching_subplans_recurse(PartitionPruneState *prunestate,
-							   PartitionPruningData *pprune,
+find_matching_subplans_recurse(PartitionPruningData *pprune,
+							   PartitionedRelPruningData *prelprune,
 							   bool initial_prune,
 							   Bitmapset **validsubplans)
 {
@@ -1748,15 +1809,16 @@ find_matching_subplans_recurse(PartitionPruneState *prunestate,
 	check_stack_depth();
 
 	/* Only prune if pruning would be useful at this level. */
-	if (initial_prune ? pprune->do_initial_prune : pprune->do_exec_prune)
+	if (initial_prune ? prelprune->do_initial_prune :
+		prelprune->do_exec_prune)
 	{
-		PartitionPruneContext *context = &pprune->context;
+		PartitionPruneContext *context = &prelprune->context;
 
 		/* Set whether we can evaluate PARAM_EXEC Params or not */
 		context->evalexecparams = !initial_prune;
 
 		partset = get_matching_partitions(context,
-										  pprune->pruning_steps);
+										  prelprune->pruning_steps);
 	}
 	else
 	{
@@ -1764,23 +1826,23 @@ find_matching_subplans_recurse(PartitionPruneState *prunestate,
 		 * If no pruning is to be done, just include all partitions at this
 		 * level.
 		 */
-		partset = pprune->present_parts;
+		partset = prelprune->present_parts;
 	}
 
 	/* Translate partset into subplan indexes */
 	i = -1;
 	while ((i = bms_next_member(partset, i)) >= 0)
 	{
-		if (pprune->subplan_map[i] >= 0)
+		if (prelprune->subplan_map[i] >= 0)
 			*validsubplans = bms_add_member(*validsubplans,
-											pprune->subplan_map[i]);
+											prelprune->subplan_map[i]);
 		else
 		{
-			int			partidx = pprune->subpart_map[i];
+			int			partidx = prelprune->subpart_map[i];
 
 			if (partidx >= 0)
-				find_matching_subplans_recurse(prunestate,
-											   &prunestate->partprunedata[partidx],
+				find_matching_subplans_recurse(pprune,
+											   &pprune->partrelprunedata[partidx],
 											   initial_prune, validsubplans);
 			else
 			{
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index e7188b2d31..488e2db1bd 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -129,7 +129,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	appendstate->as_whichplan = INVALID_SUBPLAN_INDEX;
 
 	/* If run-time partition pruning is enabled, then set that up now */
-	if (node->part_prune_infos != NIL)
+	if (node->part_prune_info != NULL)
 	{
 		PartitionPruneState *prunestate;
 
@@ -138,7 +138,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 
 		/* Create the working data structure for pruning. */
 		prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
-												   node->part_prune_infos);
+												   node->part_prune_info);
 		appendstate->as_prune_state = prunestate;
 
 		/* Perform an initial partition prune, if required. */
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index ec8a49c3a8..cc5404dfc6 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -90,7 +90,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 	mergestate->ms_noopscan = false;
 
 	/* If run-time partition pruning is enabled, then set that up now */
-	if (node->part_prune_infos != NIL)
+	if (node->part_prune_info != NULL)
 	{
 		PartitionPruneState *prunestate;
 
@@ -98,7 +98,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 		ExecAssignExprContext(estate, &mergestate->ps);
 
 		prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
-												   node->part_prune_infos);
+												   node->part_prune_info);
 		mergestate->ms_prune_state = prunestate;
 
 		/* Perform an initial partition prune, if required. */
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 96836ef19c..45a3449302 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -245,7 +245,7 @@ _copyAppend(const Append *from)
 	COPY_NODE_FIELD(appendplans);
 	COPY_SCALAR_FIELD(first_partial_plan);
 	COPY_NODE_FIELD(partitioned_rels);
-	COPY_NODE_FIELD(part_prune_infos);
+	COPY_NODE_FIELD(part_prune_info);
 
 	return newnode;
 }
@@ -273,7 +273,7 @@ _copyMergeAppend(const MergeAppend *from)
 	COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
 	COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
 	COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
-	COPY_NODE_FIELD(part_prune_infos);
+	COPY_NODE_FIELD(part_prune_info);
 
 	return newnode;
 }
@@ -1182,6 +1182,17 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
 {
 	PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
 
+	COPY_NODE_FIELD(prune_infos);
+	COPY_BITMAPSET_FIELD(other_subplans);
+
+	return newnode;
+}
+
+static PartitionedRelPruneInfo *
+_copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
+{
+	PartitionedRelPruneInfo *newnode = makeNode(PartitionedRelPruneInfo);
+
 	COPY_SCALAR_FIELD(reloid);
 	COPY_NODE_FIELD(pruning_steps);
 	COPY_BITMAPSET_FIELD(present_parts);
@@ -4908,6 +4919,9 @@ copyObjectImpl(const void *from)
 		case T_PartitionPruneInfo:
 			retval = _copyPartitionPruneInfo(from);
 			break;
+		case T_PartitionedRelPruneInfo:
+			retval = _copyPartitionedRelPruneInfo(from);
+			break;
 		case T_PartitionPruneStepOp:
 			retval = _copyPartitionPruneStepOp(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index a6454ce28b..6269f474d2 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -402,7 +402,7 @@ _outAppend(StringInfo str, const Append *node)
 	WRITE_NODE_FIELD(appendplans);
 	WRITE_INT_FIELD(first_partial_plan);
 	WRITE_NODE_FIELD(partitioned_rels);
-	WRITE_NODE_FIELD(part_prune_infos);
+	WRITE_NODE_FIELD(part_prune_info);
 }
 
 static void
@@ -435,7 +435,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
 	for (i = 0; i < node->numCols; i++)
 		appendStringInfo(str, " %s", booltostr(node->nullsFirst[i]));
 
-	WRITE_NODE_FIELD(part_prune_infos);
+	WRITE_NODE_FIELD(part_prune_info);
 }
 
 static void
@@ -1014,10 +1014,19 @@ _outPlanRowMark(StringInfo str, const PlanRowMark *node)
 
 static void
 _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
+{
+	WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
+
+	WRITE_NODE_FIELD(prune_infos);
+	WRITE_BITMAPSET_FIELD(other_subplans);
+}
+
+static void
+_outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
 {
 	int			i;
 
-	WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
+	WRITE_NODE_TYPE("PARTITIONEDRELPRUNEINFO");
 
 	WRITE_OID_FIELD(reloid);
 	WRITE_NODE_FIELD(pruning_steps);
@@ -3831,6 +3840,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PartitionPruneInfo:
 				_outPartitionPruneInfo(str, obj);
 				break;
+			case T_PartitionedRelPruneInfo:
+				_outPartitionedRelPruneInfo(str, obj);
+				break;
 			case T_PartitionPruneStepOp:
 				_outPartitionPruneStepOp(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 9a01eb6b63..3254524223 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1612,7 +1612,7 @@ _readAppend(void)
 	READ_NODE_FIELD(appendplans);
 	READ_INT_FIELD(first_partial_plan);
 	READ_NODE_FIELD(partitioned_rels);
-	READ_NODE_FIELD(part_prune_infos);
+	READ_NODE_FIELD(part_prune_info);
 
 	READ_DONE();
 }
@@ -1634,7 +1634,7 @@ _readMergeAppend(void)
 	READ_OID_ARRAY(sortOperators, local_node->numCols);
 	READ_OID_ARRAY(collations, local_node->numCols);
 	READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
-	READ_NODE_FIELD(part_prune_infos);
+	READ_NODE_FIELD(part_prune_info);
 
 	READ_DONE();
 }
@@ -2329,6 +2329,17 @@ _readPartitionPruneInfo(void)
 {
 	READ_LOCALS(PartitionPruneInfo);
 
+	READ_NODE_FIELD(prune_infos);
+	READ_BITMAPSET_FIELD(other_subplans);
+
+	READ_DONE();
+}
+
+static PartitionedRelPruneInfo *
+_readPartitionedRelPruneInfo(void)
+{
+	READ_LOCALS(PartitionedRelPruneInfo);
+
 	READ_OID_FIELD(reloid);
 	READ_NODE_FIELD(pruning_steps);
 	READ_BITMAPSET_FIELD(present_parts);
@@ -2726,6 +2737,8 @@ parseNodeString(void)
 		return_value = _readPlanRowMark();
 	else if (MATCH("PARTITIONPRUNEINFO", 18))
 		return_value = _readPartitionPruneInfo();
+	else if (MATCH("PARTITIONEDRELPRUNEINFO", 23))
+		return_value = _readPartitionedRelPruneInfo();
 	else if (MATCH("PARTITIONPRUNESTEPOP", 20))
 		return_value = _readPartitionPruneStepOp();
 	else if (MATCH("PARTITIONPRUNESTEPCOMBINE", 25))
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f04c30af45..0e80aeb65c 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1388,7 +1388,6 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	List	   *all_child_outers = NIL;
 	ListCell   *l;
 	List	   *partitioned_rels = NIL;
-	bool		build_partitioned_rels = false;
 	double		partial_rows = -1;
 
 	/* If appropriate, consider parallel append */
@@ -1413,10 +1412,11 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	if (rel->part_scheme != NULL)
 	{
 		if (IS_SIMPLE_REL(rel))
-			partitioned_rels = rel->partitioned_child_rels;
+			partitioned_rels = list_make1(rel->partitioned_child_rels);
 		else if (IS_JOIN_REL(rel))
 		{
 			int			relid = -1;
+			List	   *partrels = NIL;
 
 			/*
 			 * For a partitioned joinrel, concatenate the component rels'
@@ -1430,16 +1430,16 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 				component = root->simple_rel_array[relid];
 				Assert(component->part_scheme != NULL);
 				Assert(list_length(component->partitioned_child_rels) >= 1);
-				partitioned_rels =
-					list_concat(partitioned_rels,
+				partrels =
+					list_concat(partrels,
 								list_copy(component->partitioned_child_rels));
 			}
+
+			partitioned_rels = list_make1(partrels);
 		}
 
 		Assert(list_length(partitioned_rels) >= 1);
 	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-		build_partitioned_rels = true;
 
 	/*
 	 * For every non-dummy child, remember the cheapest path.  Also, identify
@@ -1453,17 +1453,12 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		Path	   *cheapest_partial_path = NULL;
 
 		/*
-		 * If we need to build partitioned_rels, accumulate the partitioned
-		 * rels for this child.  We must ensure that parents are always listed
-		 * before their child partitioned tables.
+		 * For UNION ALLs with non-empty partitioned_child_rels, accumulate
+		 * the Lists of child relations.
 		 */
-		if (build_partitioned_rels)
-		{
-			List	   *cprels = childrel->partitioned_child_rels;
-
-			partitioned_rels = list_concat(partitioned_rels,
-										   list_copy(cprels));
-		}
+		if (rel->rtekind == RTE_SUBQUERY && childrel->partitioned_child_rels != NIL)
+			partitioned_rels = lappend(partitioned_rels,
+									   childrel->partitioned_child_rels);
 
 		/*
 		 * If child has an unparameterized cheapest-total path, add that to
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 0a0bec3bfc..f9e6ad3ab7 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -124,6 +124,7 @@ static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
 static Plan *create_bitmap_subplan(PlannerInfo *root, Path *bitmapqual,
 					  List **qual, List **indexqual, List **indexECs);
 static void bitmap_subplan_mark_shared(Plan *plan);
+static List *flatten_partitioned_rels(List *partitioned_rels);
 static TidScan *create_tidscan_plan(PlannerInfo *root, TidPath *best_path,
 					List *tlist, List *scan_clauses);
 static SubqueryScan *create_subqueryscan_plan(PlannerInfo *root,
@@ -202,7 +203,8 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
 static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
 				   Index scanrelid, int wtParam);
 static Append *make_append(List *appendplans, int first_partial_plan,
-			List *tlist, List *partitioned_rels, List *partpruneinfos);
+			List *tlist, List *partitioned_rels,
+			PartitionPruneInfo *partpruneinfo);
 static RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree,
 					 Plan *righttree,
@@ -1030,7 +1032,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	List	   *subplans = NIL;
 	ListCell   *subpaths;
 	RelOptInfo *rel = best_path->path.parent;
-	List	   *partpruneinfos = NIL;
+	PartitionPruneInfo *partpruneinfo = NULL;
 
 	/*
 	 * The subpaths list could be empty, if every child was proven empty by
@@ -1093,8 +1095,8 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 		}
 
 		if (prunequal != NIL)
-			partpruneinfos =
-				make_partition_pruneinfo(root,
+			partpruneinfo =
+				make_partition_pruneinfo(root, rel,
 										 best_path->partitioned_rels,
 										 best_path->subpaths, prunequal);
 	}
@@ -1108,7 +1110,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 
 	plan = make_append(subplans, best_path->first_partial_path,
 					   tlist, best_path->partitioned_rels,
-					   partpruneinfos);
+					   partpruneinfo);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -1132,7 +1134,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
 	List	   *subplans = NIL;
 	ListCell   *subpaths;
 	RelOptInfo *rel = best_path->path.parent;
-	List	   *partpruneinfos = NIL;
+	PartitionPruneInfo *partpruneinfo = NULL;
 
 	/*
 	 * We don't have the actual creation of the MergeAppend node split out
@@ -1244,14 +1246,15 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
 		}
 
 		if (prunequal != NIL)
-			partpruneinfos = make_partition_pruneinfo(root,
+			partpruneinfo = make_partition_pruneinfo(root, rel,
 													  best_path->partitioned_rels,
 													  best_path->subpaths, prunequal);
 	}
 
-	node->partitioned_rels = best_path->partitioned_rels;
+	node->partitioned_rels =
+		flatten_partitioned_rels(best_path->partitioned_rels);
 	node->mergeplans = subplans;
-	node->part_prune_infos = partpruneinfos;
+	node->part_prune_info = partpruneinfo;
 
 	return (Plan *) node;
 }
@@ -5000,6 +5003,29 @@ bitmap_subplan_mark_shared(Plan *plan)
 		elog(ERROR, "unrecognized node type: %d", nodeTag(plan));
 }
 
+/*
+ * flatten_partitioned_rels
+ *		Convert List of Lists into a single List with all elements from the
+*		sub-lists.
+ */
+static List *
+flatten_partitioned_rels(List *partitioned_rels)
+{
+	ListCell   *lc;
+	List	   *newlist = NIL;
+
+	foreach(lc, partitioned_rels)
+	{
+		List	   *sublist = lfirst(lc);
+
+		Assert(sublist->type == T_IntList);
+
+		newlist = list_concat(newlist, list_copy(sublist));
+	}
+
+	return newlist;
+}
+
 /*****************************************************************************
  *
  *	PLAN NODE BUILDING ROUTINES
@@ -5343,7 +5369,7 @@ make_foreignscan(List *qptlist,
 static Append *
 make_append(List *appendplans, int first_partial_plan,
 			List *tlist, List *partitioned_rels,
-			List *partpruneinfos)
+			PartitionPruneInfo *partpruneinfo)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
@@ -5354,8 +5380,8 @@ make_append(List *appendplans, int first_partial_plan,
 	plan->righttree = NULL;
 	node->appendplans = appendplans;
 	node->first_partial_plan = first_partial_plan;
-	node->partitioned_rels = partitioned_rels;
-	node->part_prune_infos = partpruneinfos;
+	node->partitioned_rels = flatten_partitioned_rels(partitioned_rels);
+	node->part_prune_info = partpruneinfo;
 	return node;
 }
 
@@ -6512,7 +6538,7 @@ make_modifytable(PlannerInfo *root,
 	node->operation = operation;
 	node->canSetTag = canSetTag;
 	node->nominalRelation = nominalRelation;
-	node->partitioned_rels = partitioned_rels;
+	node->partitioned_rels = flatten_partitioned_rels(partitioned_rels);
 	node->partColsUpdated = partColsUpdated;
 	node->resultRelations = resultRelations;
 	node->resultRelIndex = -1;	/* will be set correctly in setrefs.c */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index df4ec448cb..fd06da98b9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1616,6 +1616,7 @@ inheritance_planner(PlannerInfo *root)
 		 * contain at least one member, that is, the root parent's index.
 		 */
 		Assert(list_length(partitioned_rels) >= 1);
+		partitioned_rels = list_make1(partitioned_rels);
 	}
 
 	/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 354eb0d4e6..9ce216c28b 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -111,7 +111,11 @@ typedef struct PruneStepResult
 	bool		scan_null;		/* Scan the partition for NULL values? */
 } PruneStepResult;
 
-
+static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
+							  RelOptInfo *parentrel,
+							  int *relid_subplan_map,
+							  List *partitioned_rels, List *prunequal,
+							  Bitmapset **matchedsubplans);
 static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
 					bool *contradictory);
 static List *gen_partprune_steps_internal(GeneratePruningStepsContext *context,
@@ -160,8 +164,8 @@ static PruneStepResult *get_matching_range_bounds(PartitionPruneContext *context
 						  FmgrInfo *partsupfunc, Bitmapset *nullkeys);
 static Bitmapset *pull_exec_paramids(Expr *expr);
 static bool pull_exec_paramids_walker(Node *node, Bitmapset **context);
-static bool analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps,
-					  int partnatts);
+static bool analyze_partkey_exprs(PartitionedRelPruneInfo *prelinfo,
+					  List *steps, int partnatts);
 static PruneStepResult *perform_pruning_base_step(PartitionPruneContext *context,
 						  PartitionPruneStepOp *opstep);
 static PruneStepResult *perform_pruning_combine_step(PartitionPruneContext *context,
@@ -176,38 +180,37 @@ static bool partkey_datum_from_expr(PartitionPruneContext *context,
 
 /*
  * make_partition_pruneinfo
- *		Build List of PartitionPruneInfos, one for each partitioned rel.
- *		These can be used in the executor to allow additional partition
- *		pruning to take place.
- *
- * Here we generate partition pruning steps for 'prunequal' and also build a
- * data structure which allows mapping of partition indexes into 'subpaths'
- * indexes.
+ *		Builds a PartitionPruneInfo which can be used in the executor to allow
+ *		additional partition pruning to take place.  Returns NULL when
+ *		partition pruning would be useless.
  *
- * If no non-Const expressions are being compared to the partition key in any
- * of the 'partitioned_rels', then we return NIL to indicate no run-time
- * pruning should be performed.  Run-time pruning would be useless, since the
- * pruning done during planning will have pruned everything that can be.
+ * 'partitioned_rels' is a List containing Lists of partitioned relids.  Here
+ * we attempt to populate the PartitionPruneInfo adding a 'prune_infos' item
+ * for each item in the 'partitioned_rels' list.  However, some of the sets of
+ * partitioned relations may not require any run-time pruning.  In these cases
+ * we'll simply not include add a 'prune_infos' for that set and instead we'll
+ * add all the subplans which belong to that set into the PartitionPruneInfo's
+ * 'other_subplans' field.  Callers will likely never want to prune subplans
+ * which are mentioned in this field.
  */
-List *
-make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
-						 List *subpaths, List *prunequal)
+PartitionPruneInfo *
+make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
+						 List *partitioned_rels, List *subpaths,
+						 List *prunequal)
 {
-	RelOptInfo *targetpart = NULL;
-	List	   *pinfolist = NIL;
-	bool		doruntimeprune = false;
+	PartitionPruneInfo *pruneinfo;
+	Bitmapset  *allmatchedsubplans = NULL;
 	int		   *relid_subplan_map;
-	int		   *relid_subpart_map;
 	ListCell   *lc;
+	List	   *prunerelinfos;
 	int			i;
 
 	/*
-	 * Construct two temporary arrays to map from planner relids to subplan
-	 * and sub-partition indexes.  For convenience, we use 1-based indexes
-	 * here, so that zero can represent an un-filled array entry.
+	 * Construct a temporary array to map from planner relids to subplan
+	 * indexes.  For convenience, we use 1-based indexes here, so that zero
+	 * can represent an un-filled array entry.
 	 */
 	relid_subplan_map = palloc0(sizeof(int) * root->simple_rel_array_size);
-	relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size);
 
 	/*
 	 * relid_subplan_map maps relid of a leaf partition to the index in
@@ -227,10 +230,110 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 		relid_subplan_map[pathrel->relid] = i++;
 	}
 
+	Assert(partitioned_rels->type == T_List);
+
+	prunerelinfos = NIL;
+
+	/* We now build a PartitionedRelPruneInfo for each partitioned rel. */
+	foreach(lc, partitioned_rels)
+	{
+		List	   *rels = lfirst(lc);
+		List	   *prelinfolist;
+		Bitmapset  *matchedsubplans = NULL;
+
+		prelinfolist = make_partitionedrel_pruneinfo(root, parentrel,
+													 relid_subplan_map,
+													 rels, prunequal,
+													 &matchedsubplans);
+
+		/* When pruning is possible, record the matched subplans */
+		if (prelinfolist != NIL)
+		{
+			prunerelinfos = lappend(prunerelinfos, prelinfolist);
+			allmatchedsubplans = bms_join(matchedsubplans,
+										  allmatchedsubplans);
+		}
+	}
+
+	pfree(relid_subplan_map);
+
+	/*
+	 * if none of the partition hierarchies had any useful run-time pruning
+	 * quals then we can safely not bother with run-time pruning.
+	 */
+	if (prunerelinfos == NIL)
+		return NULL;
+
+	pruneinfo = makeNode(PartitionPruneInfo);
+	pruneinfo->prune_infos = prunerelinfos;
+
+	/*
+	 * Some subplans may not belong to of the listed partitioned_rels.  This
+	 * can happen for UNION ALL queries which include a non-partitioned table.
+	 * We record all of the subplans which we didn't build any
+	 * PartitionedRelPruneInfo for so that callers can easily identify which
+	 * subplans should not be pruned.
+	 */
+	if (bms_num_members(allmatchedsubplans) < list_length(subpaths))
+	{
+		Bitmapset  *other_subplans;
+
+		/* Create an inverted set of allmatchedsubplans */
+		other_subplans = bms_add_range(NULL, 0, list_length(subpaths) - 1);
+		other_subplans = bms_del_members(other_subplans, allmatchedsubplans);
+
+		pruneinfo->other_subplans = other_subplans;
+	}
+	else
+		pruneinfo->other_subplans = NULL;
+
+	return pruneinfo;
+}
+
+/*
+ * make_partitionedrel_pruneinfo
+ *		Build a List of PartitionedRelPruneInfos, one for each partitioned
+ *		rel.  These can be used in the executor to allow additional partition
+ *		pruning to take place.
+ *
+ * Here we generate partition pruning steps for 'prunequal' and also build a
+ * data structure which allows mapping of partition indexes into 'subpaths'
+ * indexes.
+ *
+ * If no non-Const expressions are being compared to the partition key in any
+ * of the 'partitioned_rels', then we return NIL to indicate no run-time
+ * pruning should be performed.  Run-time pruning would be useless since the
+ * pruning done during planning will have pruned everything that can be.
+ *
+ * On non-NIL return, 'matchedsubplans' is set to the subplan indexes which
+ * were matched to this partition hierarchy.
+ */
+static List *
+make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
+							  int *relid_subplan_map,
+							  List *partitioned_rels, List *prunequal,
+							  Bitmapset **matchedsubplans)
+{
+	RelOptInfo *targetpart = NULL;
+	List	   *prelinfolist = NIL;
+	bool		doruntimeprune = false;
+	bool		hascontradictingquals = false;
+	ListCell   *lc;
+	int		   *relid_subpart_map;
+	Bitmapset  *subplansfound = NULL;
+	int			i;
+
+	/*
+	 * Construct a temporary array to map from planner relids to index of the
+	 * partitioned_rel.  For convenience, we use 1-based indexes here, so that
+	 * zero can represent an un-filled array entry.
+	 */
+	relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size);
+
 	/*
 	 * relid_subpart_map maps relid of a non-leaf partition to the index in
 	 * 'partitioned_rels' of that rel (which will also be the index in the
-	 * returned PartitionPruneInfo list of the info for that partition).
+	 * returned PartitionedRelPruneInfo list of the info for that partition).
 	 */
 	i = 1;
 	foreach(lc, partitioned_rels)
@@ -246,12 +349,12 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 		relid_subpart_map[rti] = i++;
 	}
 
-	/* We now build a PartitionPruneInfo for each partitioned rel */
+	/* We now build a PartitionedRelPruneInfo for each partitioned rel */
 	foreach(lc, partitioned_rels)
 	{
 		Index		rti = lfirst_int(lc);
 		RelOptInfo *subpart = find_base_rel(root, rti);
-		PartitionPruneInfo *pinfo;
+		PartitionedRelPruneInfo *prelinfo;
 		RangeTblEntry *rte;
 		Bitmapset  *present_parts;
 		int			nparts = subpart->nparts;
@@ -269,6 +372,31 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 		if (!targetpart)
 		{
 			targetpart = subpart;
+
+			/*
+			 * When the first listed partitioned table is not the same rel as
+			 * 'parentrel', then we must be dealing with a UNION ALL
+			 * parentrel.  We'd better translate the pruning qual so that it's
+			 * compatible with the top-level partitioned table.  We overwrite
+			 * the input parameter here so that subsequent translations for
+			 * sub-partitioned tables translate from the top-level partitioned
+			 * table, rather than the UNION ALL parent.
+			 */
+			if (parentrel != subpart)
+			{
+				int			nappinfos;
+				AppendRelInfo **appinfos = find_appinfos_by_relids(root,
+																   subpart->relids,
+																   &nappinfos);
+
+				prunequal = (List *) adjust_appendrel_attrs(root, (Node *)
+															prunequal,
+															nappinfos,
+															appinfos);
+
+				pfree(appinfos);
+			}
+
 			partprunequal = prunequal;
 		}
 		else
@@ -320,35 +448,44 @@ make_partition_pruneinfo(PlannerInfo *root, List *partitioned_rels,
 
 			subplan_map[i] = subplanidx;
 			subpart_map[i] = subpartidx;
-			if (subplanidx >= 0 || subpartidx >= 0)
+			if (subplanidx >= 0)
+			{
+				present_parts = bms_add_member(present_parts, i);
+
+				/* Record finding this subplan  */
+				subplansfound = bms_add_member(subplansfound, subplanidx);
+			}
+			else if (subpartidx >= 0)
 				present_parts = bms_add_member(present_parts, i);
 		}
 
+
 		rte = root->simple_rte_array[subpart->relid];
 
-		pinfo = makeNode(PartitionPruneInfo);
-		pinfo->reloid = rte->relid;
-		pinfo->pruning_steps = pruning_steps;
-		pinfo->present_parts = present_parts;
-		pinfo->nparts = nparts;
-		pinfo->subplan_map = subplan_map;
-		pinfo->subpart_map = subpart_map;
+		prelinfo = makeNode(PartitionedRelPruneInfo);
+		prelinfo->reloid = rte->relid;
+		prelinfo->pruning_steps = pruning_steps;
+		prelinfo->present_parts = present_parts;
+		prelinfo->nparts = nparts;
+		prelinfo->subplan_map = subplan_map;
+		prelinfo->subpart_map = subpart_map;
 
 		/* Determine which pruning types should be enabled at this level */
-		doruntimeprune |= analyze_partkey_exprs(pinfo, pruning_steps,
+		doruntimeprune |= analyze_partkey_exprs(prelinfo, pruning_steps,
 												partnatts);
 
-		pinfolist = lappend(pinfolist, pinfo);
+		prelinfolist = lappend(prelinfolist, prelinfo);
 	}
 
-	pfree(relid_subplan_map);
 	pfree(relid_subpart_map);
 
-	if (doruntimeprune)
-		return pinfolist;
+	if (!doruntimeprune)
+		return NIL;
+
+	*matchedsubplans = subplansfound;
 
 	/* No run-time pruning required. */
-	return NIL;
+	return prelinfolist;
 }
 
 /*
@@ -2758,10 +2895,11 @@ pull_exec_paramids_walker(Node *node, Bitmapset **context)
  *		executor startup-time or executor run-time pruning.
  *
  * Returns true if any executor partition pruning should be attempted at this
- * level.  Also fills fields of *pinfo to record how to process each step.
+ * level.  Also fills fields of *prelinfo to record how to process each step.
  */
 static bool
-analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps, int partnatts)
+analyze_partkey_exprs(PartitionedRelPruneInfo *prelinfo, List *steps,
+					  int partnatts)
 {
 	bool		doruntimeprune = false;
 	ListCell   *lc;
@@ -2771,11 +2909,12 @@ analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps, int partnatts)
 	 * Otherwise, if their expressions aren't simple Consts, they require
 	 * startup-time pruning.
 	 */
-	pinfo->nexprs = list_length(steps) * partnatts;
-	pinfo->hasexecparam = (bool *) palloc0(sizeof(bool) * pinfo->nexprs);
-	pinfo->do_initial_prune = false;
-	pinfo->do_exec_prune = false;
-	pinfo->execparamids = NULL;
+	prelinfo->nexprs = list_length(steps) * partnatts;
+	prelinfo->hasexecparam = (bool *) palloc0(sizeof(bool) *
+											  prelinfo->nexprs);
+	prelinfo->do_initial_prune = false;
+	prelinfo->do_exec_prune = false;
+	prelinfo->execparamids = NULL;
 
 	foreach(lc, steps)
 	{
@@ -2799,16 +2938,16 @@ analyze_partkey_exprs(PartitionPruneInfo *pinfo, List *steps, int partnatts)
 														step->step.step_id,
 														keyno);
 
-				Assert(stateidx < pinfo->nexprs);
+				Assert(stateidx < prelinfo->nexprs);
 				hasexecparams = !bms_is_empty(execparamids);
-				pinfo->hasexecparam[stateidx] = hasexecparams;
-				pinfo->execparamids = bms_join(pinfo->execparamids,
-											   execparamids);
+				prelinfo->hasexecparam[stateidx] = hasexecparams;
+				prelinfo->execparamids = bms_join(prelinfo->execparamids,
+												  execparamids);
 
 				if (hasexecparams)
-					pinfo->do_exec_prune = true;
+					prelinfo->do_exec_prune = true;
 				else
-					pinfo->do_initial_prune = true;
+					prelinfo->do_initial_prune = true;
 
 				doruntimeprune = true;
 			}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 862bf65060..4327fd4cb1 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -113,13 +113,13 @@ typedef struct PartitionTupleRouting
 } PartitionTupleRouting;
 
 /*-----------------------
- * PartitionPruningData - Per-partitioned-table data for run-time pruning
+ * PartitionedRelPruningData - Per-partitioned-table data for run-time pruning
  * of partitions.  For a multilevel partitioned table, we have one of these
  * for the topmost partition plus one for each non-leaf child partition,
  * ordered such that parents appear before their children.
  *
  * subplan_map[] and subpart_map[] have the same definitions as in
- * PartitionPruneInfo (see plannodes.h); though note that here,
+ * PartitionedRelPruneInfo (see plannodes.h); though note that here,
  * subpart_map contains indexes into PartitionPruneState.partprunedata[].
  *
  * subplan_map					Subplan index by partition index, or -1.
@@ -136,7 +136,7 @@ typedef struct PartitionTupleRouting
  *								executor run (for this partitioning level).
  *-----------------------
  */
-typedef struct PartitionPruningData
+typedef struct PartitionedRelPruningData
 {
 	int		   *subplan_map;
 	int		   *subpart_map;
@@ -145,6 +145,17 @@ typedef struct PartitionPruningData
 	List	   *pruning_steps;
 	bool		do_initial_prune;
 	bool		do_exec_prune;
+} PartitionedRelPruningData;
+
+/*
+ * PartitionPruningData - Encapsulates an array of PartitionedRelPruningData
+ * which belong to a single partition hierarchy containing 1 or more
+ * partitions.
+ */
+typedef struct PartitionPruningData
+{
+	int			num_partrelprunedata;
+	PartitionedRelPruningData partrelprunedata[FLEXIBLE_ARRAY_MEMBER];
 } PartitionPruningData;
 
 /*-----------------------
@@ -158,9 +169,6 @@ typedef struct PartitionPruningData
  * table per parent plan node, hence partprunedata[] need describe only one
  * partitioning hierarchy.
  *
- * partprunedata		Array of PartitionPruningData for the plan's
- *						partitioned relation, ordered such that parent tables
- *						appear before children (hence, topmost table is first).
  * num_partprunedata	Number of items in 'partprunedata' array.
  * do_initial_prune		true if pruning should be performed during executor
  *						startup (at any hierarchy level).
@@ -170,18 +178,27 @@ typedef struct PartitionPruningData
  *						any of the partprunedata structs.  Pruning must be
  *						done again each time the value of one of these
  *						parameters changes.
+ * other_subplans		Contains subplan indexes which don't belong to any
+ *						'partprunedata', e.g UNION ALL children that are not
+ *						partitioned tables or a partitioned table that the
+ *						planner deemed run-time pruning to be useless for.
+ *						These must not be pruned.
  * prune_context		A short-lived memory context in which to execute the
  *						partition pruning functions.
+ * partprunedata		Array of PartitionPruningData pointers for the plan's
+ *						partitioned relation, ordered such that parent tables
+ *						appear before children (hence, topmost table is first).
  *-----------------------
  */
 typedef struct PartitionPruneState
 {
-	PartitionPruningData *partprunedata;
 	int			num_partprunedata;
 	bool		do_initial_prune;
 	bool		do_exec_prune;
 	Bitmapset  *execparamids;
+	Bitmapset  *other_subplans;
 	MemoryContext prune_context;
+	PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
 } PartitionPruneState;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
@@ -209,7 +226,7 @@ extern HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map,
 extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
 						PartitionTupleRouting *proute);
 extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
-							  List *partitionpruneinfo);
+							  PartitionPruneInfo *partitionpruneinfo);
 extern void ExecDestroyPartitionPruneState(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
 extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 43f1552241..697d3d7a5f 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -88,6 +88,7 @@ typedef enum NodeTag
 	T_NestLoopParam,
 	T_PlanRowMark,
 	T_PartitionPruneInfo,
+	T_PartitionedRelPruneInfo,
 	T_PartitionPruneStepOp,
 	T_PartitionPruneStepCombine,
 	T_PlanInvalItem,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b80df601cd..a1a782d2f6 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -241,6 +241,8 @@ typedef struct ModifyTable
 	List	   *exclRelTlist;	/* tlist of the EXCLUDED pseudo relation */
 } ModifyTable;
 
+struct PartitionPruneInfo;
+
 /* ----------------
  *	 Append node -
  *		Generate the concatenation of the results of sub-plans.
@@ -260,8 +262,8 @@ typedef struct Append
 	/* RT indexes of non-leaf tables in a partition tree */
 	List	   *partitioned_rels;
 
-	/* Info for run-time subplan pruning, one entry per partitioned_rels */
-	List	   *part_prune_infos;	/* List of PartitionPruneInfo */
+	/* Info for run-time subplan pruning */
+	struct PartitionPruneInfo *part_prune_info;
 } Append;
 
 /* ----------------
@@ -282,8 +284,8 @@ typedef struct MergeAppend
 	Oid		   *collations;		/* OIDs of collations */
 	bool	   *nullsFirst;		/* NULLS FIRST/LAST directions */
 
-	/* Info for run-time subplan pruning, one entry per partitioned_rels */
-	List	   *part_prune_infos;	/* List of PartitionPruneInfo */
+	/* Info for run-time subplan pruning */
+	struct PartitionPruneInfo *part_prune_info;
 } MergeAppend;
 
 /* ----------------
@@ -1054,18 +1056,34 @@ typedef struct PlanRowMark
  */
 
 /*
- * PartitionPruneInfo - Details required to allow the executor to prune
+ * PartitionPruneInfo-  - Details required to allow the executor to prune
  * partitions.
  *
+ * prune_infos			List of Lists containing PartitionedRelPruneInfo
+ * other_subplans		Indexes of any subplans which are not accounted for
+ *						by any of the PartitionedRelPruneInfo stored in
+ *						'prune_infos'.
+ */
+typedef struct PartitionPruneInfo
+{
+	NodeTag		type;
+	List	   *prune_infos;
+	Bitmapset  *other_subplans;
+} PartitionPruneInfo;
+
+/*
+ * PartitionedRelPruneInfo - Details required to allow the executor to prune
+ * partitions for a single partitioned table.
+ *
  * Here we store mapping details to allow translation of a partitioned table's
  * index as returned by the partition pruning code into subplan indexes for
  * plan types which support arbitrary numbers of subplans, such as Append.
  * We also store various details to tell the executor when it should be
  * performing partition pruning.
  *
- * Each PartitionPruneInfo describes the partitioning rules for a single
+ * Each PartitionedRelPruneInfo describes the partitioning rules for a single
  * partitioned table (a/k/a level of partitioning).  For a multilevel
- * partitioned table, we have a List of PartitionPruneInfos, where the
+ * partitioned table, we have a List of PartitionedRelPruneInfo, where the
  * first entry represents the topmost partitioned table and additional
  * entries represent non-leaf child partitions, ordered such that parents
  * appear before their children.
@@ -1076,11 +1094,12 @@ typedef struct PlanRowMark
  * zero-based index of the partition's subplan in the parent plan's subplan
  * list; it is -1 if the partition is non-leaf or has been pruned.  For a
  * non-leaf partition p, subpart_map[p] contains the zero-based index of
- * that sub-partition's PartitionPruneInfo in the plan's PartitionPruneInfo
- * list; it is -1 if the partition is a leaf or has been pruned.  All these
- * indexes are global across the whole partitioned table and Append plan node.
+ * that sub-partition's PartitionedRelPruneInfo in the plan's
+ * PartitionedRelPruneInfo list; it is -1 if the partition is a leaf or has
+ * been pruned.  All these indexes are global across the whole partitioned
+ * table and the parenting plan node.
  */
-typedef struct PartitionPruneInfo
+typedef struct PartitionedRelPruneInfo
 {
 	NodeTag		type;
 	Oid			reloid;			/* OID of partition rel for this level */
@@ -1098,7 +1117,7 @@ typedef struct PartitionPruneInfo
 	bool		do_exec_prune;	/* true if pruning should be performed during
 								 * executor run. */
 	Bitmapset  *execparamids;	/* All PARAM_EXEC Param IDs in pruning_steps */
-} PartitionPruneInfo;
+} PartitionedRelPruneInfo;
 
 /*
  * Abstract Node type for partition pruning steps (there are no concrete
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 9944d2832f..df3bcb737d 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -74,7 +74,8 @@ typedef struct PartitionPruneContext
 #define PruneCxtStateIdx(partnatts, step_id, keyno) \
 	((partnatts) * (step_id) + (keyno))
 
-extern List *make_partition_pruneinfo(PlannerInfo *root,
+extern PartitionPruneInfo *make_partition_pruneinfo(PlannerInfo *root,
+						 RelOptInfo *parentrel,
 						 List *partitioned_rels,
 						 List *subpaths, List *prunequal);
 extern Relids prune_append_rel_partitions(RelOptInfo *rel);
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 38bd179c22..eb65e54969 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2367,6 +2367,96 @@ select * from ab where a = (select max(a) from lprt_a) and b = (select max(a)-1
                Index Cond: (a = $0)
 (52 rows)
 
+-- Test run-time partition pruning with UNION ALL parents
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from ab where a = 1 union all select * from ab) ab where b = (select 1);
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   InitPlan 1 (returns $0)
+     ->  Result (actual rows=1 loops=1)
+   ->  Append (actual rows=0 loops=1)
+         ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_b1_1 (actual rows=0 loops=1)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                     Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b2 ab_a1_b2_1 (never executed)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+                     Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b3 ab_a1_b3_1 (never executed)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+                     Index Cond: (a = 1)
+   ->  Seq Scan on ab_a1_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a1_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a1_b3 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b3 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b3 (never executed)
+         Filter: (b = $0)
+(37 rows)
+
+-- A case containing a UNION ALL with a non-partitioned child.
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from ab where a = 1 union all (values(10,5)) union all select * from ab) ab where b = (select 1);
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
+ Append (actual rows=0 loops=1)
+   InitPlan 1 (returns $0)
+     ->  Result (actual rows=1 loops=1)
+   ->  Append (actual rows=0 loops=1)
+         ->  Bitmap Heap Scan on ab_a1_b1 ab_a1_b1_1 (actual rows=0 loops=1)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b1_a_idx (actual rows=0 loops=1)
+                     Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b2 ab_a1_b2_1 (never executed)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b2_a_idx (never executed)
+                     Index Cond: (a = 1)
+         ->  Bitmap Heap Scan on ab_a1_b3 ab_a1_b3_1 (never executed)
+               Recheck Cond: (a = 1)
+               Filter: (b = $0)
+               ->  Bitmap Index Scan on ab_a1_b3_a_idx (never executed)
+                     Index Cond: (a = 1)
+   ->  Result (actual rows=0 loops=1)
+         One-Time Filter: (5 = $0)
+   ->  Seq Scan on ab_a1_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a1_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a1_b3 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a2_b3 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b1 (actual rows=0 loops=1)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b2 (never executed)
+         Filter: (b = $0)
+   ->  Seq Scan on ab_a3_b3 (never executed)
+         Filter: (b = $0)
+(39 rows)
+
 deallocate ab_q1;
 deallocate ab_q2;
 deallocate ab_q3;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index e5e7789fc5..9d9b3980f6 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -533,6 +533,14 @@ reset max_parallel_workers_per_gather;
 explain (analyze, costs off, summary off, timing off)
 select * from ab where a = (select max(a) from lprt_a) and b = (select max(a)-1 from lprt_a);
 
+-- Test run-time partition pruning with UNION ALL parents
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from ab where a = 1 union all select * from ab) ab where b = (select 1);
+
+-- A case containing a UNION ALL with a non-partitioned child.
+explain (analyze, costs off, summary off, timing off)
+select * from (select * from ab where a = 1 union all (values(10,5)) union all select * from ab) ab where b = (select 1);
+
 deallocate ab_q1;
 deallocate ab_q2;
 deallocate ab_q3;
-- 
2.17.1

#31David Rowley
david.rowley@2ndquadrant.com
In reply to: Alvaro Herrera (#29)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 18 July 2018 at 06:01, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2018-Jul-16, David Rowley wrote:

On 16 July 2018 at 12:55, David Rowley <david.rowley@2ndquadrant.com> wrote:

Thinking about this some more, I don't quite see any reason that the
partitioned_rels for a single hierarchy couldn't just be a Bitmapset
instead of an IntList.

Of course, this is not possible since we can't pass a List of
Bitmapsets to the executor due to Bitmapset not being a node type.

Maybe we can just add a new node type that wraps a lone bitmapset. The
naive implementation (just print out individual members) should be
pretty straightforward; a more sophisticated one (print out the "words")
may end up more compact or not depending on density, but much harder for
debugging, and probably means a catversion bump when BITS_PER_BITMAPWORD
is changed, so probably not a great idea anyway.

I suppose the only reason we haven't done this yet is nobody has needed
it. Sounds like its time has come.

I don't mind doing the work if that's what's wanted, but I'd rather
wait for Tom to provide a bit more input into this as he seems to have
some ideas that I don't understand well enough to write code for.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#32Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#30)
1 attachment(s)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2018/07/19 22:03, David Rowley wrote:

v3-0001-Fix-run-time-partition-pruning-for-UNION-ALL-pare.patch

Thanks for updating the patch.

I studied this patch today and concluded that it's going a bit too far by
carrying the nested partition pruning info structures from the planner all
the way into the executor.

I understood the root cause of this issue as that make_partition_pruneinfo
trips when UNION ALL's parent subquery, instead of the actual individual
partitioned root tables, is treated as the root parent rel when converting
prunequals using appenrel_adjust_*. That happens because of a flattened
partitioned_rels list whose members are all assumed by
make_partition_pruneinfo to have the same root parent and that it is an
actual partitioned table. That assumption fails in this case because the
parent is really the UNION ALL subquery rel.

I think the fix implemented in the patch by modifying allpaths.c is
correct, whereby the partition hierarchies are preserved by having nested
Lists of partitioned rels. So, the partitioned_rels List corresponding to
UNION ALL subquery parent itself contains Lists corresponding to
partitioned tables appearing under it. With that,
make_partition_pruneinfo (actually, make_partitionedrel_pruneinfo in the
patch) can correctly perform its work for every sub-List, because for each
sub-List, it can identify the correct root partitioned table parent to use.

But I don't think the result of make_partition_pruneinfo itself has to be
List of PartitionedRelPruneInfo nested under PartitionPruneInfo. I gather
that each PartitionPruneInfo corresponds to each root partitioned table
and a PartitionedRelPruneInfo contains the actual pruning information,
which is created for every partitioned table (non-leaf tables), including
the root tables. I don't think such nesting is necessary. I think that
just like flattened partitioned_rels list, we should put flattened list of
PartitionedRelPruneInfo into the Append or MergeAppend plan. No need for
nesting PartitionedRelPruneInfo under PartitionPruneInfo.

We create a relid_subplan_map from the flattened list of sub-plans, where
sub-plans of leaf partitions of different partitioned tables appear in the
same list. Similarly, I think, we should create relid_subpart_map from
the flattened list of partitioned_rels, where partitioned rel RT indexes
coming from different partitioned tables appear in the same list.
Currently relid_subpart_map seems to be constructed separately for each
sub-List of nested partitioned_rels list, so while subplan_map of each
PartitionedRelPruneInfo contains indexes into a global array of leaf
partition sub-plans, subpart_map contains indexes into local array of
PartitionedRelPruneInfo for that partitioned table. But, I think there is
not a big hurdle in making even the latter contain indexes into global
array of PartitionedRelPruneInfos of *all* partitioned tables.

On the executor side, instead of having PartitionedRelPruningData be
nested under PartitionPruningData, which in turn are stored in the
top-level PartitionPruneState, store them directly in PartitionPruneState,
since we're making planner put global indexes into subpart_map. Slight
adjustment seems to be needed to make ExecInitFindMatchingSubPlans and
ExecFindMatchingSubPlans skip the PartitionedRelPruningData of non-root
tables, because find_matching_subplans_recurse takes care of recursing to
non-root ones. Actually, not skipping them seems to cause wrong result.

To verify that such an approach would actually work, I modified the
relevant parts of your patch and confirmed that it does. See attached a
delta patch.

Thanks,
Amit

PS: Other than the main patch, I think it would be nice to add a RT index
field to PartitionedRelPruneInfo in addition to the existing Oid field.
It would help to identify and fetch the Relation from a hypothetical
executor-local array of Relation pointers which is addressable by RT indexes.

Attachments:

v3-0001-delta.patchtext/plain; charset=UTF-8; name=v3-0001-delta.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index dac789d414..f81ff672ca 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -48,7 +48,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 									 bool *isnull,
 									 int maxfieldlen);
 static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
-static void find_matching_subplans_recurse(PartitionPruningData *pprune,
+static void find_matching_subplans_recurse(PartitionPruneState *prunestate,
 							   PartitionedRelPruningData *prelprune,
 							   bool initial_prune,
 							   Bitmapset **validsubplans);
@@ -1396,14 +1396,10 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  *
  * 'partitionpruneinfo' is a PartitionPruneInfo as generated by
  * make_partition_pruneinfo.  Here we build a PartitionPruneState containing a
- * PartitionPruningData for each partitionpruneinfo->prune_infos, in
- * turn, a PartitionedRelPruningData is created for each
- * PartitionedRelPruneInfo stored in each 'prune_infos'.  This two-level system
- * is required in order to support run-time pruning with UNION ALL parents
- * containing one or more partitioned tables as children.  The data stored in
- * each PartitionedRelPruningData can be re-used each time we re-evaluate
- * which partitions match the pruning steps provided in each
- * PartitionedRelPruneInfo.
+ * PartitionedRelPruningData for each PartitionedRelPruneInfo
+ * in partitionpruneinfo->prune_infos.  The data stored in each
+ * PartitionedRelPruningData can be re-used each time we re-evaluate which
+ * partitions match the pruning steps provided in each PartitionedRelPruneInfo.
  */
 PartitionPruneState *
 ExecCreatePartitionPruneState(PlanState *planstate,
@@ -1422,8 +1418,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
 	 * Allocate the data structure
 	 */
 	prunestate = (PartitionPruneState *)
-		palloc(offsetof(PartitionPruneState, partprunedata) +
-			   sizeof(PartitionPruningData *) * n_part_hierarchies);
+		palloc(offsetof(PartitionPruneState, partrelprunedata) +
+			   sizeof(PartitionedRelPruningData) * n_part_hierarchies);
 
 	prunestate->num_partprunedata = n_part_hierarchies;
 	prunestate->do_initial_prune = false;	/* may be set below */
@@ -1445,125 +1441,109 @@ ExecCreatePartitionPruneState(PlanState *planstate,
 	i = 0;
 	foreach(lc, partitionpruneinfo->prune_infos)
 	{
-		ListCell   *lc2;
-		List	   *partrelpruneinfos = lfirst_node(List, lc);
-		PartitionPruningData *prunedata;
-		int			npartrelpruneinfos = list_length(partrelpruneinfos);
-		int			j;
+		PartitionedRelPruneInfo *pinfo = castNode(PartitionedRelPruneInfo, lfirst(lc));
+		PartitionedRelPruningData *prelprune = &prunestate->partrelprunedata[i];
+		PartitionPruneContext *context = &prelprune->context;
+		PartitionDesc partdesc;
+		PartitionKey partkey;
+		int			partnatts;
+		int			n_steps;
+		ListCell   *lc3;
 
-		prunedata = palloc(offsetof(PartitionPruningData, partrelprunedata) +
-						   npartrelpruneinfos * sizeof(PartitionedRelPruningData));
-		prunestate->partprunedata[i] = prunedata;
-		prunedata->num_partrelprunedata = npartrelpruneinfos;
+		/*
+		 * We must copy the subplan_map rather than pointing directly to
+		 * the plan's version, as we may end up making modifications to it
+		 * later.
+		 */
+		prelprune->subplan_map = palloc(sizeof(int) * pinfo->nparts);
+		memcpy(prelprune->subplan_map, pinfo->subplan_map,
+			   sizeof(int) * pinfo->nparts);
 
-		j = 0;
-		foreach(lc2, partrelpruneinfos)
+		/* We can use the subpart_map verbatim, since we never modify it */
+		prelprune->subpart_map = pinfo->subpart_map;
+
+		/* present_parts is also subject to later modification */
+		prelprune->present_parts = bms_copy(pinfo->present_parts);
+
+		/*
+		 * We need to hold a pin on the partitioned table's relcache entry
+		 * so that we can rely on its copies of the table's partition key
+		 * and partition descriptor.  We need not get a lock though; one
+		 * should have been acquired already by InitPlan or
+		 * ExecLockNonLeafAppendTables.
+		 */
+		context->partrel = relation_open(pinfo->reloid, NoLock);
+
+		partkey = RelationGetPartitionKey(context->partrel);
+		partdesc = RelationGetPartitionDesc(context->partrel);
+		n_steps = list_length(pinfo->pruning_steps);
+
+		context->strategy = partkey->strategy;
+		context->partnatts = partnatts = partkey->partnatts;
+		context->nparts = pinfo->nparts;
+		context->boundinfo = partdesc->boundinfo;
+		context->partcollation = partkey->partcollation;
+		context->partsupfunc = partkey->partsupfunc;
+
+		/* We'll look up type-specific support functions as needed */
+		context->stepcmpfuncs = (FmgrInfo *)
+			palloc0(sizeof(FmgrInfo) * n_steps * partnatts);
+
+		context->ppccontext = CurrentMemoryContext;
+		context->planstate = planstate;
+
+		/* Initialize expression state for each expression we need */
+		context->exprstates = (ExprState **)
+			palloc0(sizeof(ExprState *) * n_steps * partnatts);
+		foreach(lc3, pinfo->pruning_steps)
 		{
-			PartitionedRelPruneInfo *pinfo = castNode(PartitionedRelPruneInfo, lfirst(lc2));
-			PartitionedRelPruningData *prelprune = &prunedata->partrelprunedata[j];
-			PartitionPruneContext *context = &prelprune->context;
-			PartitionDesc partdesc;
-			PartitionKey partkey;
-			int			partnatts;
-			int			n_steps;
-			ListCell   *lc3;
+			PartitionPruneStepOp *step = (PartitionPruneStepOp *) lfirst(lc3);
+			ListCell   *lc4;
+			int			keyno;
 
-			/*
-			 * We must copy the subplan_map rather than pointing directly to
-			 * the plan's version, as we may end up making modifications to it
-			 * later.
-			 */
-			prelprune->subplan_map = palloc(sizeof(int) * pinfo->nparts);
-			memcpy(prelprune->subplan_map, pinfo->subplan_map,
-				   sizeof(int) * pinfo->nparts);
+			/* not needed for other step kinds */
+			if (!IsA(step, PartitionPruneStepOp))
+				continue;
 
-			/* We can use the subpart_map verbatim, since we never modify it */
-			prelprune->subpart_map = pinfo->subpart_map;
+			Assert(list_length(step->exprs) <= partnatts);
 
-			/* present_parts is also subject to later modification */
-			prelprune->present_parts = bms_copy(pinfo->present_parts);
-
-			/*
-			 * We need to hold a pin on the partitioned table's relcache entry
-			 * so that we can rely on its copies of the table's partition key
-			 * and partition descriptor.  We need not get a lock though; one
-			 * should have been acquired already by InitPlan or
-			 * ExecLockNonLeafAppendTables.
-			 */
-			context->partrel = relation_open(pinfo->reloid, NoLock);
-
-			partkey = RelationGetPartitionKey(context->partrel);
-			partdesc = RelationGetPartitionDesc(context->partrel);
-			n_steps = list_length(pinfo->pruning_steps);
-
-			context->strategy = partkey->strategy;
-			context->partnatts = partnatts = partkey->partnatts;
-			context->nparts = pinfo->nparts;
-			context->boundinfo = partdesc->boundinfo;
-			context->partcollation = partkey->partcollation;
-			context->partsupfunc = partkey->partsupfunc;
-
-			/* We'll look up type-specific support functions as needed */
-			context->stepcmpfuncs = (FmgrInfo *)
-				palloc0(sizeof(FmgrInfo) * n_steps * partnatts);
-
-			context->ppccontext = CurrentMemoryContext;
-			context->planstate = planstate;
-
-			/* Initialize expression state for each expression we need */
-			context->exprstates = (ExprState **)
-				palloc0(sizeof(ExprState *) * n_steps * partnatts);
-			foreach(lc3, pinfo->pruning_steps)
+			keyno = 0;
+			foreach(lc4, step->exprs)
 			{
-				PartitionPruneStepOp *step = (PartitionPruneStepOp *) lfirst(lc3);
-				ListCell   *lc4;
-				int			keyno;
+				Expr	   *expr = (Expr *) lfirst(lc4);
 
-				/* not needed for other step kinds */
-				if (!IsA(step, PartitionPruneStepOp))
-					continue;
-
-				Assert(list_length(step->exprs) <= partnatts);
-
-				keyno = 0;
-				foreach(lc4, step->exprs)
+				/* not needed for Consts */
+				if (!IsA(expr, Const))
 				{
-					Expr	   *expr = (Expr *) lfirst(lc4);
+					int			stateidx = PruneCxtStateIdx(partnatts,
+															step->step.step_id,
+															keyno);
 
-					/* not needed for Consts */
-					if (!IsA(expr, Const))
-					{
-						int			stateidx = PruneCxtStateIdx(partnatts,
-																step->step.step_id,
-																keyno);
-
-						context->exprstates[stateidx] =
-							ExecInitExpr(expr, context->planstate);
-					}
-					keyno++;
+					context->exprstates[stateidx] =
+						ExecInitExpr(expr, context->planstate);
 				}
+				keyno++;
 			}
-
-			/* Array is not modified at runtime, so just point to plan's copy */
-			context->exprhasexecparam = pinfo->hasexecparam;
-
-			prelprune->pruning_steps = pinfo->pruning_steps;
-			prelprune->do_initial_prune = pinfo->do_initial_prune;
-			prelprune->do_exec_prune = pinfo->do_exec_prune;
-
-			/* Record if pruning would be useful at any level */
-			prunestate->do_initial_prune |= pinfo->do_initial_prune;
-			prunestate->do_exec_prune |= pinfo->do_exec_prune;
-
-			/*
-			 * Accumulate the IDs of all PARAM_EXEC Params affecting the
-			 * partitioning decisions at this plan node.
-			 */
-			prunestate->execparamids = bms_add_members(prunestate->execparamids,
-													   pinfo->execparamids);
-
-			j++;
 		}
+
+		/* Array is not modified at runtime, so just point to plan's copy */
+		context->exprhasexecparam = pinfo->hasexecparam;
+
+		prelprune->pruning_steps = pinfo->pruning_steps;
+		prelprune->do_initial_prune = pinfo->do_initial_prune;
+		prelprune->do_exec_prune = pinfo->do_exec_prune;
+
+		/* Record if pruning would be useful at any level */
+		prunestate->do_initial_prune |= pinfo->do_initial_prune;
+		prunestate->do_exec_prune |= pinfo->do_exec_prune;
+
+		/*
+		 * Accumulate the IDs of all PARAM_EXEC Params affecting the
+		 * partitioning decisions at this plan node.
+		 */
+		prunestate->execparamids = bms_add_members(prunestate->execparamids,
+												   pinfo->execparamids);
+
 		i++;
 	}
 	return prunestate;
@@ -1579,17 +1559,14 @@ ExecCreatePartitionPruneState(PlanState *planstate,
 void
 ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
 {
-	PartitionPruningData **partprunedata = prunestate->partprunedata;
+	PartitionedRelPruningData *partrelprunedata = prunestate->partrelprunedata;
 	int			i;
 
 	for (i = 0; i < prunestate->num_partprunedata; i++)
 	{
-		PartitionPruningData *pprune = partprunedata[i];
-		PartitionedRelPruningData *prunedata = pprune->partrelprunedata;
-		int			j;
+		PartitionedRelPruningData prunedata = partrelprunedata[i];
 
-		for (j = 0; j < pprune->num_partrelprunedata; j++)
-			relation_close(prunedata[j].context.partrel, NoLock);
+		relation_close(prunedata.context.partrel, NoLock);
 	}
 }
 
@@ -1623,14 +1600,16 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 
 	for (i = 0; i < prunestate->num_partprunedata; i++)
 	{
-		PartitionPruningData *pprune;
 		PartitionedRelPruningData *prelprune;
 
-		pprune = prunestate->partprunedata[i];
-		prelprune = &pprune->partrelprunedata[0];
+		prelprune = &prunestate->partrelprunedata[i];
+
+		/* Only call find_matching_subplans_recurse on root table. */
+		if (prelprune->context.partrel->rd_rel->relispartition)
+			continue;
 
 		/* Perform pruning without using PARAM_EXEC Params */
-		find_matching_subplans_recurse(pprune, prelprune, true, &result);
+		find_matching_subplans_recurse(prunestate, prelprune, true, &result);
 
 		/* Expression eval may have used space in node's ps_ExprContext too */
 		ResetExprContext(prelprune->context.planstate->ps_ExprContext);
@@ -1681,58 +1660,50 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 		 * 'present_parts'.
 		 */
 
-		for (i = 0; i < prunestate->num_partprunedata; i++)
+		for (i = prunestate->num_partprunedata - 1; i >= 0; i--)
 		{
-			int			j;
-			PartitionPruningData *prunedata;
+			PartitionedRelPruningData *pprune;
+			int			nparts;
+			int			k;
 
-			prunedata = prunestate->partprunedata[i];
+			pprune = &prunestate->partrelprunedata[i];
+			nparts = pprune->context.nparts;
+			/* We just rebuild present_parts from scratch */
+			bms_free(pprune->present_parts);
+			pprune->present_parts = NULL;
 
-			for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
+			for (k = 0; k < nparts; k++)
 			{
-				PartitionedRelPruningData *pprune;
-				int			nparts;
-				int			k;
+				int			oldidx = pprune->subplan_map[k];
+				int			subidx;
 
-				pprune = &prunedata->partrelprunedata[j];
-				nparts = pprune->context.nparts;
-				/* We just rebuild present_parts from scratch */
-				bms_free(pprune->present_parts);
-				pprune->present_parts = NULL;
-
-				for (k = 0; k < nparts; k++)
+				/*
+				 * If this partition existed as a subplan then change the
+				 * old subplan index to the new subplan index.  The new
+				 * index may become -1 if the partition was pruned above,
+				 * or it may just come earlier in the subplan list due to
+				 * some subplans being removed earlier in the list.  If
+				 * it's a subpartition, add it to present_parts unless
+				 * it's entirely pruned.
+				 */
+				if (oldidx >= 0)
 				{
-					int			oldidx = pprune->subplan_map[k];
-					int			subidx;
+					Assert(oldidx < nsubplans);
+					pprune->subplan_map[k] = new_subplan_indexes[oldidx];
 
-					/*
-					 * If this partition existed as a subplan then change the
-					 * old subplan index to the new subplan index.  The new
-					 * index may become -1 if the partition was pruned above,
-					 * or it may just come earlier in the subplan list due to
-					 * some subplans being removed earlier in the list.  If
-					 * it's a subpartition, add it to present_parts unless
-					 * it's entirely pruned.
-					 */
-					if (oldidx >= 0)
-					{
-						Assert(oldidx < nsubplans);
-						pprune->subplan_map[k] = new_subplan_indexes[oldidx];
+					if (new_subplan_indexes[oldidx] >= 0)
+						pprune->present_parts =
+							bms_add_member(pprune->present_parts, k);
+				}
+				else if ((subidx = pprune->subpart_map[k]) >= 0)
+				{
+					PartitionedRelPruningData *subprune;
 
-						if (new_subplan_indexes[oldidx] >= 0)
-							pprune->present_parts =
-								bms_add_member(pprune->present_parts, k);
-					}
-					else if ((subidx = pprune->subpart_map[k]) >= 0)
-					{
-						PartitionedRelPruningData *subprune;
+					subprune = &prunestate->partrelprunedata[subidx];
 
-						subprune = &prunedata->partrelprunedata[subidx];
-
-						if (!bms_is_empty(subprune->present_parts))
-							pprune->present_parts =
-								bms_add_member(pprune->present_parts, k);
-					}
+					if (!bms_is_empty(subprune->present_parts))
+						pprune->present_parts =
+							bms_add_member(pprune->present_parts, k);
 				}
 			}
 		}
@@ -1764,18 +1735,21 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
 
 	for (i = 0; i < prunestate->num_partprunedata; i++)
 	{
-		PartitionPruningData *pprune;
 		PartitionedRelPruningData *prelprune;
 
-		pprune = prunestate->partprunedata[i];
-		prelprune = &pprune->partrelprunedata[0];
+		prelprune = &prunestate->partrelprunedata[i];
 
-		find_matching_subplans_recurse(pprune, prelprune, false, &result);
+		/* Only call find_matching_subplans_recurse on root table. */
+		if (prelprune->context.partrel->rd_rel->relispartition)
+			continue;
+
+		find_matching_subplans_recurse(prunestate, prelprune, false, &result);
 
 		/* Expression eval may have used space in node's ps_ExprContext too */
 		ResetExprContext(prelprune->context.planstate->ps_ExprContext);
 	}
 
+
 	MemoryContextSwitchTo(oldcontext);
 
 	/* Copy result out of the temp context before we reset it */
@@ -1797,7 +1771,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
  * Adds valid (non-prunable) subplan IDs to *validsubplans
  */
 static void
-find_matching_subplans_recurse(PartitionPruningData *pprune,
+find_matching_subplans_recurse(PartitionPruneState *prunestate,
 							   PartitionedRelPruningData *prelprune,
 							   bool initial_prune,
 							   Bitmapset **validsubplans)
@@ -1841,8 +1815,8 @@ find_matching_subplans_recurse(PartitionPruningData *pprune,
 			int			partidx = prelprune->subpart_map[i];
 
 			if (partidx >= 0)
-				find_matching_subplans_recurse(pprune,
-											   &pprune->partrelprunedata[partidx],
+				find_matching_subplans_recurse(prunestate,
+											   &prunestate->partrelprunedata[partidx],
 											   initial_prune, validsubplans);
 			else
 			{
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f9e6ad3ab7..c7872661c4 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1033,6 +1033,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	ListCell   *subpaths;
 	RelOptInfo *rel = best_path->path.parent;
 	PartitionPruneInfo *partpruneinfo = NULL;
+	List	   *flattened_partitioned_rels = NIL;
 
 	/*
 	 * The subpaths list could be empty, if every child was proven empty by
@@ -1083,6 +1084,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 
 		prunequal = extract_actual_clauses(rel->baserestrictinfo, false);
 
+		flattened_partitioned_rels =
+					flatten_partitioned_rels(best_path->partitioned_rels);
+
 		if (best_path->path.param_info)
 		{
 			List	   *prmquals = best_path->path.param_info->ppi_clauses;
@@ -1098,6 +1102,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 			partpruneinfo =
 				make_partition_pruneinfo(root, rel,
 										 best_path->partitioned_rels,
+										 flattened_partitioned_rels,
 										 best_path->subpaths, prunequal);
 	}
 
@@ -1109,7 +1114,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 */
 
 	plan = make_append(subplans, best_path->first_partial_path,
-					   tlist, best_path->partitioned_rels,
+					   tlist, flattened_partitioned_rels,
 					   partpruneinfo);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1135,6 +1140,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
 	ListCell   *subpaths;
 	RelOptInfo *rel = best_path->path.parent;
 	PartitionPruneInfo *partpruneinfo = NULL;
+	List	   *flattened_partitioned_rels = NIL;
 
 	/*
 	 * We don't have the actual creation of the MergeAppend node split out
@@ -1233,6 +1239,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
 
 		prunequal = extract_actual_clauses(rel->baserestrictinfo, false);
 
+		flattened_partitioned_rels =
+					flatten_partitioned_rels(best_path->partitioned_rels);
+
 		if (best_path->path.param_info)
 		{
 
@@ -1247,12 +1256,12 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
 
 		if (prunequal != NIL)
 			partpruneinfo = make_partition_pruneinfo(root, rel,
-													  best_path->partitioned_rels,
-													  best_path->subpaths, prunequal);
+													 best_path->partitioned_rels,
+													 flattened_partitioned_rels,
+													 best_path->subpaths, prunequal);
 	}
 
-	node->partitioned_rels =
-		flatten_partitioned_rels(best_path->partitioned_rels);
+	node->partitioned_rels = flattened_partitioned_rels;
 	node->mergeplans = subplans;
 	node->part_prune_info = partpruneinfo;
 
@@ -5006,7 +5015,7 @@ bitmap_subplan_mark_shared(Plan *plan)
 /*
  * flatten_partitioned_rels
  *		Convert List of Lists into a single List with all elements from the
-*		sub-lists.
+ *		sub-lists.
  */
 static List *
 flatten_partitioned_rels(List *partitioned_rels)
@@ -5380,8 +5389,9 @@ make_append(List *appendplans, int first_partial_plan,
 	plan->righttree = NULL;
 	node->appendplans = appendplans;
 	node->first_partial_plan = first_partial_plan;
-	node->partitioned_rels = flatten_partitioned_rels(partitioned_rels);
+	node->partitioned_rels = partitioned_rels;
 	node->part_prune_info = partpruneinfo;
+
 	return node;
 }
 
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9ce216c28b..490eb4090d 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -114,6 +114,7 @@ typedef struct PruneStepResult
 static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
 							  RelOptInfo *parentrel,
 							  int *relid_subplan_map,
+							  int *relid_subpart_map,
 							  List *partitioned_rels, List *prunequal,
 							  Bitmapset **matchedsubplans);
 static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
@@ -195,12 +196,15 @@ static bool partkey_datum_from_expr(PartitionPruneContext *context,
  */
 PartitionPruneInfo *
 make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
-						 List *partitioned_rels, List *subpaths,
+						 List *partitioned_rels,
+						 List *flattened_partitioned_rels,
+						 List *subpaths,
 						 List *prunequal)
 {
 	PartitionPruneInfo *pruneinfo;
 	Bitmapset  *allmatchedsubplans = NULL;
 	int		   *relid_subplan_map;
+	int		   *relid_subpart_map;
 	ListCell   *lc;
 	List	   *prunerelinfos;
 	int			i;
@@ -230,6 +234,38 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
 		relid_subplan_map[pathrel->relid] = i++;
 	}
 
+	/*
+	 * Construct a temporary array to map from planner relids to index of the
+	 * partitioned_rel.  For convenience, we use 1-based indexes here, so that
+	 * zero can represent an un-filled array entry.
+	 *
+	 * Also, since we're going to flatten the list before putting it into the
+	 * plan, use indexes into the flattened list in the mapping arrays of
+	 * resulting PartitionedRelPruneInfo nodes, instead of indexes into
+	 * individual sub-lists.
+	 */
+	relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size);
+
+	/*
+	 * relid_subpart_map maps relid of a non-leaf partition to the index in
+	 * 'partitioned_rels' of that rel (which will also be the index in the
+	 * returned PartitionedRelPruneInfo list of the info for that partition).
+	 */
+	i = 1;
+	foreach(lc, flattened_partitioned_rels)
+	{
+		Index		rti = lfirst_int(lc);
+
+		Assert(rti < root->simple_rel_array_size);
+		/* No duplicates please */
+		Assert(relid_subpart_map[rti] == 0);
+		/* Same rel cannot be both leaf and non-leaf */
+		Assert(relid_subplan_map[rti] == 0);
+
+		relid_subpart_map[rti] = i++;
+	}
+
+
 	Assert(partitioned_rels->type == T_List);
 
 	prunerelinfos = NIL;
@@ -243,19 +279,22 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
 
 		prelinfolist = make_partitionedrel_pruneinfo(root, parentrel,
 													 relid_subplan_map,
+													 relid_subpart_map,
 													 rels, prunequal,
 													 &matchedsubplans);
 
 		/* When pruning is possible, record the matched subplans */
 		if (prelinfolist != NIL)
 		{
-			prunerelinfos = lappend(prunerelinfos, prelinfolist);
+			prunerelinfos = list_concat(prunerelinfos,
+										list_copy(prelinfolist));
 			allmatchedsubplans = bms_join(matchedsubplans,
 										  allmatchedsubplans);
 		}
 	}
 
 	pfree(relid_subplan_map);
+	pfree(relid_subpart_map);
 
 	/*
 	 * if none of the partition hierarchies had any useful run-time pruning
@@ -310,45 +349,17 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
  */
 static List *
 make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
-							  int *relid_subplan_map,
+							  int *relid_subplan_map, int *relid_subpart_map,
 							  List *partitioned_rels, List *prunequal,
 							  Bitmapset **matchedsubplans)
 {
 	RelOptInfo *targetpart = NULL;
 	List	   *prelinfolist = NIL;
 	bool		doruntimeprune = false;
-	bool		hascontradictingquals = false;
 	ListCell   *lc;
-	int		   *relid_subpart_map;
 	Bitmapset  *subplansfound = NULL;
 	int			i;
 
-	/*
-	 * Construct a temporary array to map from planner relids to index of the
-	 * partitioned_rel.  For convenience, we use 1-based indexes here, so that
-	 * zero can represent an un-filled array entry.
-	 */
-	relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size);
-
-	/*
-	 * relid_subpart_map maps relid of a non-leaf partition to the index in
-	 * 'partitioned_rels' of that rel (which will also be the index in the
-	 * returned PartitionedRelPruneInfo list of the info for that partition).
-	 */
-	i = 1;
-	foreach(lc, partitioned_rels)
-	{
-		Index		rti = lfirst_int(lc);
-
-		Assert(rti < root->simple_rel_array_size);
-		/* No duplicates please */
-		Assert(relid_subpart_map[rti] == 0);
-		/* Same rel cannot be both leaf and non-leaf */
-		Assert(relid_subplan_map[rti] == 0);
-
-		relid_subpart_map[rti] = i++;
-	}
-
 	/* We now build a PartitionedRelPruneInfo for each partitioned rel */
 	foreach(lc, partitioned_rels)
 	{
@@ -477,8 +488,6 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
 		prelinfolist = lappend(prelinfolist, prelinfo);
 	}
 
-	pfree(relid_subpart_map);
-
 	if (!doruntimeprune)
 		return NIL;
 
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 4327fd4cb1..5b6acac4f0 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -147,17 +147,6 @@ typedef struct PartitionedRelPruningData
 	bool		do_exec_prune;
 } PartitionedRelPruningData;
 
-/*
- * PartitionPruningData - Encapsulates an array of PartitionedRelPruningData
- * which belong to a single partition hierarchy containing 1 or more
- * partitions.
- */
-typedef struct PartitionPruningData
-{
-	int			num_partrelprunedata;
-	PartitionedRelPruningData partrelprunedata[FLEXIBLE_ARRAY_MEMBER];
-} PartitionPruningData;
-
 /*-----------------------
  * PartitionPruneState - State object required for plan nodes to perform
  * run-time partition pruning.
@@ -198,7 +187,7 @@ typedef struct PartitionPruneState
 	Bitmapset  *execparamids;
 	Bitmapset  *other_subplans;
 	MemoryContext prune_context;
-	PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
+	PartitionedRelPruningData partrelprunedata[FLEXIBLE_ARRAY_MEMBER];
 } PartitionPruneState;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index df3bcb737d..79398d1cc1 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -77,6 +77,7 @@ typedef struct PartitionPruneContext
 extern PartitionPruneInfo *make_partition_pruneinfo(PlannerInfo *root,
 						 RelOptInfo *parentrel,
 						 List *partitioned_rels,
+						 List *flattened_partitioned_rels,
 						 List *subpaths, List *prunequal);
 extern Relids prune_append_rel_partitions(RelOptInfo *rel);
 extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
#33David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#32)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 20 July 2018 at 21:44, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

But I don't think the result of make_partition_pruneinfo itself has to be
List of PartitionedRelPruneInfo nested under PartitionPruneInfo. I gather
that each PartitionPruneInfo corresponds to each root partitioned table
and a PartitionedRelPruneInfo contains the actual pruning information,
which is created for every partitioned table (non-leaf tables), including
the root tables. I don't think such nesting is necessary. I think that
just like flattened partitioned_rels list, we should put flattened list of
PartitionedRelPruneInfo into the Append or MergeAppend plan. No need for
nesting PartitionedRelPruneInfo under PartitionPruneInfo.

To do that properly you'd need to mark the target partitioned table of
each hierarchy. Your test of pg_class.relispartition is not going to
work as you're assuming the query is always going to the root. The
user might query some intermediate partitioned table (which will have
relispartition = true). Your patch will fall flat in that case.

You could work around that by having some array that points to the
target partitioned table of each hierarchy, but I don't see why that's
better than having the additional struct. There's also some code
inside ExecFindInitialMatchingSubPlans() which does a backward scan
over the partitions. This must process children before their parents.
Unsure how well that's going to work if we start mixing the
hierarchies. I'm sure it can be made to work providing each hierarchy
is stored together consecutively in the array, but it just seems
pretty fragile to me. That code is already pretty hard to follow.

What's the reason you don't like the additional level to represent
multiple hierarchies?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#34Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David Rowley (#33)
2 attachment(s)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2018/07/21 0:17, David Rowley wrote:

On 20 July 2018 at 21:44, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

But I don't think the result of make_partition_pruneinfo itself has to be
List of PartitionedRelPruneInfo nested under PartitionPruneInfo. I gather
that each PartitionPruneInfo corresponds to each root partitioned table
and a PartitionedRelPruneInfo contains the actual pruning information,
which is created for every partitioned table (non-leaf tables), including
the root tables. I don't think such nesting is necessary. I think that
just like flattened partitioned_rels list, we should put flattened list of
PartitionedRelPruneInfo into the Append or MergeAppend plan. No need for
nesting PartitionedRelPruneInfo under PartitionPruneInfo.

To do that properly you'd need to mark the target partitioned table of
each hierarchy. Your test of pg_class.relispartition is not going to
work as you're assuming the query is always going to the root. The
user might query some intermediate partitioned table (which will have
relispartition = true). Your patch will fall flat in that case.

Yeah, I forgot to consider that.

You could work around that by having some array that points to the
target partitioned table of each hierarchy, but I don't see why that's
better than having the additional struct.

Or it could be a Bitmapset called root_indexes that stores the offset of
the first Index value in every partitioned_rels list contained in turn in
the list that's passed to make_partition_pruneinfo.

There's also some code
inside ExecFindInitialMatchingSubPlans() which does a backward scan
over the partitions. This must process children before their parents.
Unsure how well that's going to work if we start mixing the
hierarchies. I'm sure it can be made to work providing each hierarchy
is stored together consecutively in the array, but it just seems
pretty fragile to me. That code is already pretty hard to follow.

I don't see how removing a nested loop changes things for worse. AIUI,
the code replaces index values contained in the subplan_map arrays of
various PartitionedRelPruningData structs to account for any pruned
sub-plans. Removing a nesting level because of having removed the nesting
struct doesn't seem to affect anything about that translation. But your
point here seems to be about the relative ordering of
PartitionedRelPruningData structs among themselves being affected due to
their now being put into a flat array, although I don't see that as being
any more fragile. We already are assuming a bunch about the relative
ordering of sub-plans or of PartitionedRelPruningData structs to have been
relying on storing their indexes in subplan_map and subpart_map. Also, it
occurred to me that the new subplan indexes that
ExecFindInitialMatchingSubPlans computes are based on where subplans are
actually stored in the AppendState.appendplans array, which, in turn, is
based on the Bitmapset of "valid subplans" that
ExecFindInitialMatchingSubPlans passes back to ExecInitAppend.

What's the reason you don't like the additional level to represent
multiple hierarchies?

I started thinking about flattening PartitionedRelPruneInfo after looking
at flatten_partitioned_rels() in your patch. If we're flattening
partitioned_rels (that is, not having it as a List of Lists in the
finished plan), why not flatten the pruning info node too? As I said
earlier, I get it that we need List of Lists within the planner to get
make_partition_pruneinfo to work correctly in these types of queries, but
once we have figured out the correct details to pass to executor to
perform run-time pruning, I don't see why we need to pass that info again
as a List of Lists.

I have attached v2 of the delta patch which adds a root_indexes field to
PartitionPruneInfo to track topmost parents in every partition hierarchy
contained whose pruning info is contained in the Append.

Also, I noticed a bug with how ExecFindInitialMatchingSubPlans handles
other_subplans. While the indexes in subplan_map arrays are updated to
contain revised values after pruning, those in the other_subplans
Bitmapset are not, leading to crashes or possibly wrong result. For example:

create table p (a int, b int, c int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create table q (a int, b int, c int) partition by list (a);
create table q1 partition of q for values in (1) partition by list (b);
create table q11 partition of q1 for values in (1) partition by list (c);
create table q111 partition of q11 for values in (1);
create table q2 partition of q for values in (2) partition by list (b);
create table q21 partition of q2 for values in (1);
create table q22 partition of q2 for values in (2);

prepare q (int, int) as
select *
from (
select * from p
union all
select * from q1
union all
select 1, 1, 1
) s(a, b, c)
where s.a = $1 and s.b = $2 and s.c = (select 1);

set plan_cache_mode TO force_generic_plan;

explain (costs off, analyze) execute q (1, 1);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I have attached a fix for that as a delta patch, which results in:

explain (costs off, analyze) execute q (1, 1);
QUERY PLAN
──────────────────────────────────────────────────────────────────
Append (actual time=0.153..0.179 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Result (actual time=0.023..0.032 rows=1 loops=1)
Subplans Removed: 1
-> Seq Scan on p1 (actual time=0.022..0.022 rows=0 loops=1)
Filter: ((a = $1) AND (b = $2) AND (c = $0))
-> Seq Scan on q111 (actual time=0.012..0.012 rows=0 loops=1)
Filter: ((a = $1) AND (b = $2) AND (c = $0))
-> Result (actual time=0.014..0.022 rows=1 loops=1)
One-Time Filter: ((1 = $1) AND (1 = $2) AND (1 = $0))
Planning Time: 8.136 ms
Execution Time: 0.562 ms
(12 rows)

Thanks,
Amit

Attachments:

v3-0001-delta-v2.patchtext/plain; charset=UTF-8; name=v3-0001-delta-v2.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b5f796f5ed..7d8ff821c7 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -48,7 +48,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
 									 bool *isnull,
 									 int maxfieldlen);
 static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
-static void find_matching_subplans_recurse(PartitionPruningData *pprune,
+static void find_matching_subplans_recurse(PartitionPruneState *prunestate,
 							   PartitionedRelPruningData *prelprune,
 							   bool initial_prune,
 							   Bitmapset **validsubplans);
@@ -1396,14 +1396,10 @@ adjust_partition_tlist(List *tlist, TupleConversionMap *map)
  *
  * 'partitionpruneinfo' is a PartitionPruneInfo as generated by
  * make_partition_pruneinfo.  Here we build a PartitionPruneState containing a
- * PartitionPruningData for each partitionpruneinfo->prune_infos, in
- * turn, a PartitionedRelPruningData is created for each
- * PartitionedRelPruneInfo stored in each 'prune_infos'.  This two-level system
- * is required in order to support run-time pruning with UNION ALL parents
- * containing one or more partitioned tables as children.  The data stored in
- * each PartitionedRelPruningData can be re-used each time we re-evaluate
- * which partitions match the pruning steps provided in each
- * PartitionedRelPruneInfo.
+ * PartitionedRelPruningData for each PartitionedRelPruneInfo
+ * in partitionpruneinfo->prune_infos.  The data stored in each
+ * PartitionedRelPruningData can be re-used each time we re-evaluate which
+ * partitions match the pruning steps provided in each PartitionedRelPruneInfo.
  */
 PartitionPruneState *
 ExecCreatePartitionPruneState(PlanState *planstate,
@@ -1422,14 +1418,15 @@ ExecCreatePartitionPruneState(PlanState *planstate,
 	 * Allocate the data structure
 	 */
 	prunestate = (PartitionPruneState *)
-		palloc(offsetof(PartitionPruneState, partprunedata) +
-			   sizeof(PartitionPruningData *) * n_part_hierarchies);
+		palloc(offsetof(PartitionPruneState, partrelprunedata) +
+			   sizeof(PartitionedRelPruningData) * n_part_hierarchies);
 
 	prunestate->num_partprunedata = n_part_hierarchies;
 	prunestate->do_initial_prune = false;	/* may be set below */
 	prunestate->do_exec_prune = false;	/* may be set below */
 	prunestate->execparamids = NULL;
 	prunestate->other_subplans = bms_copy(partitionpruneinfo->other_subplans);
+	prunestate->root_indexes = bms_copy(partitionpruneinfo->root_indexes);
 
 	/*
 	 * Create a short-term memory context which we'll use when making calls to
@@ -1445,127 +1442,112 @@ ExecCreatePartitionPruneState(PlanState *planstate,
 	i = 0;
 	foreach(lc, partitionpruneinfo->prune_infos)
 	{
+		PartitionedRelPruneInfo *pinfo = castNode(PartitionedRelPruneInfo, lfirst(lc));
+		PartitionedRelPruningData *prelprune = &prunestate->partrelprunedata[i];
+		PartitionPruneContext *context = &prelprune->context;
+		PartitionDesc partdesc;
+		PartitionKey partkey;
+		int			partnatts;
+		int			n_steps;
 		ListCell   *lc2;
-		List	   *partrelpruneinfos = lfirst_node(List, lc);
-		PartitionPruningData *prunedata;
-		int			npartrelpruneinfos = list_length(partrelpruneinfos);
-		int			j;
 
-		prunedata = palloc(offsetof(PartitionPruningData, partrelprunedata) +
-						   npartrelpruneinfos * sizeof(PartitionedRelPruningData));
-		prunestate->partprunedata[i] = prunedata;
-		prunedata->num_partrelprunedata = npartrelpruneinfos;
+		/*
+		 * We must copy the subplan_map rather than pointing directly to
+		 * the plan's version, as we may end up making modifications to it
+		 * later.
+		 */
+		prelprune->subplan_map = palloc(sizeof(int) * pinfo->nparts);
+		memcpy(prelprune->subplan_map, pinfo->subplan_map,
+			   sizeof(int) * pinfo->nparts);
 
-		j = 0;
-		foreach(lc2, partrelpruneinfos)
+		/* We can use the subpart_map verbatim, since we never modify it */
+		prelprune->subpart_map = pinfo->subpart_map;
+
+		/* present_parts is also subject to later modification */
+		prelprune->present_parts = bms_copy(pinfo->present_parts);
+
+		/*
+		 * We need to hold a pin on the partitioned table's relcache entry so
+		 * that we can rely on its copies of the table's partition key and
+		 * partition descriptor.  We need not get a lock though; one should
+		 * have been acquired already by InitPlan or
+		 * ExecLockNonLeafAppendTables.
+		 */
+		context->partrel = relation_open(pinfo->reloid, NoLock);
+
+		partkey = RelationGetPartitionKey(context->partrel);
+		partdesc = RelationGetPartitionDesc(context->partrel);
+		n_steps = list_length(pinfo->pruning_steps);
+
+		context->strategy = partkey->strategy;
+		context->partnatts = partnatts = partkey->partnatts;
+		context->nparts = pinfo->nparts;
+		context->boundinfo = partdesc->boundinfo;
+		context->partcollation = partkey->partcollation;
+		context->partsupfunc = partkey->partsupfunc;
+
+		/* We'll look up type-specific support functions as needed */
+		context->stepcmpfuncs = (FmgrInfo *)
+			palloc0(sizeof(FmgrInfo) * n_steps * partnatts);
+
+		context->ppccontext = CurrentMemoryContext;
+		context->planstate = planstate;
+
+		/* Initialize expression state for each expression we need */
+		context->exprstates = (ExprState **)
+			palloc0(sizeof(ExprState *) * n_steps * partnatts);
+		foreach(lc2, pinfo->pruning_steps)
 		{
-			PartitionedRelPruneInfo *pinfo = castNode(PartitionedRelPruneInfo, lfirst(lc2));
-			PartitionedRelPruningData *prelprune = &prunedata->partrelprunedata[j];
-			PartitionPruneContext *context = &prelprune->context;
-			PartitionDesc partdesc;
-			PartitionKey partkey;
-			int			partnatts;
-			int			n_steps;
+			PartitionPruneStepOp *step = (PartitionPruneStepOp *) lfirst(lc2);
 			ListCell   *lc3;
+			int			keyno;
 
-			/*
-			 * We must copy the subplan_map rather than pointing directly to
-			 * the plan's version, as we may end up making modifications to it
-			 * later.
-			 */
-			prelprune->subplan_map = palloc(sizeof(int) * pinfo->nparts);
-			memcpy(prelprune->subplan_map, pinfo->subplan_map,
-				   sizeof(int) * pinfo->nparts);
+			/* not needed for other step kinds */
+			if (!IsA(step, PartitionPruneStepOp))
+				continue;
 
-			/* We can use the subpart_map verbatim, since we never modify it */
-			prelprune->subpart_map = pinfo->subpart_map;
+			Assert(list_length(step->exprs) <= partnatts);
 
-			/* present_parts is also subject to later modification */
-			prelprune->present_parts = bms_copy(pinfo->present_parts);
-
-			/*
-			 * We need to hold a pin on the partitioned table's relcache entry
-			 * so that we can rely on its copies of the table's partition key
-			 * and partition descriptor.  We need not get a lock though; one
-			 * should have been acquired already by InitPlan or
-			 * ExecLockNonLeafAppendTables.
-			 */
-			context->partrel = relation_open(pinfo->reloid, NoLock);
-
-			partkey = RelationGetPartitionKey(context->partrel);
-			partdesc = RelationGetPartitionDesc(context->partrel);
-			n_steps = list_length(pinfo->pruning_steps);
-
-			context->strategy = partkey->strategy;
-			context->partnatts = partnatts = partkey->partnatts;
-			context->nparts = pinfo->nparts;
-			context->boundinfo = partdesc->boundinfo;
-			context->partcollation = partkey->partcollation;
-			context->partsupfunc = partkey->partsupfunc;
-
-			/* We'll look up type-specific support functions as needed */
-			context->stepcmpfuncs = (FmgrInfo *)
-				palloc0(sizeof(FmgrInfo) * n_steps * partnatts);
-
-			context->ppccontext = CurrentMemoryContext;
-			context->planstate = planstate;
-
-			/* Initialize expression state for each expression we need */
-			context->exprstates = (ExprState **)
-				palloc0(sizeof(ExprState *) * n_steps * partnatts);
-			foreach(lc3, pinfo->pruning_steps)
+			keyno = 0;
+			foreach(lc3, step->exprs)
 			{
-				PartitionPruneStepOp *step = (PartitionPruneStepOp *) lfirst(lc3);
-				ListCell   *lc4;
-				int			keyno;
+				Expr	   *expr = (Expr *) lfirst(lc3);
 
-				/* not needed for other step kinds */
-				if (!IsA(step, PartitionPruneStepOp))
-					continue;
-
-				Assert(list_length(step->exprs) <= partnatts);
-
-				keyno = 0;
-				foreach(lc4, step->exprs)
+				/* not needed for Consts */
+				if (!IsA(expr, Const))
 				{
-					Expr	   *expr = (Expr *) lfirst(lc4);
+					int			stateidx = PruneCxtStateIdx(partnatts,
+															step->step.step_id,
+															keyno);
 
-					/* not needed for Consts */
-					if (!IsA(expr, Const))
-					{
-						int			stateidx = PruneCxtStateIdx(partnatts,
-																step->step.step_id,
-																keyno);
-
-						context->exprstates[stateidx] =
-							ExecInitExpr(expr, context->planstate);
-					}
-					keyno++;
+					context->exprstates[stateidx] =
+						ExecInitExpr(expr, context->planstate);
 				}
+				keyno++;
 			}
-
-			/* Array is not modified at runtime, so just point to plan's copy */
-			context->exprhasexecparam = pinfo->hasexecparam;
-
-			prelprune->pruning_steps = pinfo->pruning_steps;
-			prelprune->do_initial_prune = pinfo->do_initial_prune;
-			prelprune->do_exec_prune = pinfo->do_exec_prune;
-
-			/* Record if pruning would be useful at any level */
-			prunestate->do_initial_prune |= pinfo->do_initial_prune;
-			prunestate->do_exec_prune |= pinfo->do_exec_prune;
-
-			/*
-			 * Accumulate the IDs of all PARAM_EXEC Params affecting the
-			 * partitioning decisions at this plan node.
-			 */
-			prunestate->execparamids = bms_add_members(prunestate->execparamids,
-													   pinfo->execparamids);
-
-			j++;
 		}
+
+		/* Array is not modified at runtime, so just point to plan's copy */
+		context->exprhasexecparam = pinfo->hasexecparam;
+
+		prelprune->pruning_steps = pinfo->pruning_steps;
+		prelprune->do_initial_prune = pinfo->do_initial_prune;
+		prelprune->do_exec_prune = pinfo->do_exec_prune;
+
+		/* Record if pruning would be useful at any level */
+		prunestate->do_initial_prune |= pinfo->do_initial_prune;
+		prunestate->do_exec_prune |= pinfo->do_exec_prune;
+
+		/*
+		 * Accumulate the IDs of all PARAM_EXEC Params affecting the
+		 * partitioning decisions at this plan node.
+		 */
+		prunestate->execparamids = bms_add_members(prunestate->execparamids,
+												   pinfo->execparamids);
+
 		i++;
 	}
+
 	return prunestate;
 }
 
@@ -1579,17 +1561,14 @@ ExecCreatePartitionPruneState(PlanState *planstate,
 void
 ExecDestroyPartitionPruneState(PartitionPruneState *prunestate)
 {
-	PartitionPruningData **partprunedata = prunestate->partprunedata;
+	PartitionedRelPruningData *partrelprunedata = prunestate->partrelprunedata;
 	int			i;
 
 	for (i = 0; i < prunestate->num_partprunedata; i++)
 	{
-		PartitionPruningData *pprune = partprunedata[i];
-		PartitionedRelPruningData *prunedata = pprune->partrelprunedata;
-		int			j;
+		PartitionedRelPruningData prunedata = partrelprunedata[i];
 
-		for (j = 0; j < pprune->num_partrelprunedata; j++)
-			relation_close(prunedata[j].context.partrel, NoLock);
+		relation_close(prunedata.context.partrel, NoLock);
 	}
 }
 
@@ -1623,14 +1602,21 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 
 	for (i = 0; i < prunestate->num_partprunedata; i++)
 	{
-		PartitionPruningData *pprune;
 		PartitionedRelPruningData *prelprune;
 
-		pprune = prunestate->partprunedata[i];
-		prelprune = &pprune->partrelprunedata[0];
+		prelprune = &prunestate->partrelprunedata[i];
+
+		/*
+		 * Only call find_matching_subplans_recurse for the entries
+		 * corresponding to the topmost table of each partition hierarchy, as
+		 * the others are accessed recursively via
+		 * find_matching_subplans_recurse.
+		 */
+		if (!bms_is_member(i, prunestate->root_indexes))
+			continue;
 
 		/* Perform pruning without using PARAM_EXEC Params */
-		find_matching_subplans_recurse(pprune, prelprune, true, &result);
+		find_matching_subplans_recurse(prunestate, prelprune, true, &result);
 
 		/* Expression eval may have used space in node's ps_ExprContext too */
 		ResetExprContext(prelprune->context.planstate->ps_ExprContext);
@@ -1694,61 +1680,57 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 		 * 'present_parts'.
 		 */
 
-		for (i = 0; i < prunestate->num_partprunedata; i++)
+		for (i = prunestate->num_partprunedata - 1; i >= 0; i--)
 		{
-			int			j;
-			PartitionPruningData *prunedata;
+			PartitionedRelPruningData *pprune;
+			int			nparts;
+			int			k;
 
-			prunedata = prunestate->partprunedata[i];
+			pprune = &prunestate->partrelprunedata[i];
+			nparts = pprune->context.nparts;
+			/* We just rebuild present_parts from scratch */
+			bms_free(pprune->present_parts);
+			pprune->present_parts = NULL;
 
-			for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
+			for (k = 0; k < nparts; k++)
 			{
-				PartitionedRelPruningData *pprune;
-				int			nparts;
-				int			k;
+				int			oldidx = pprune->subplan_map[k];
+				int			subidx;
 
-				pprune = &prunedata->partrelprunedata[j];
-				nparts = pprune->context.nparts;
-				/* We just rebuild present_parts from scratch */
-				bms_free(pprune->present_parts);
-				pprune->present_parts = NULL;
-
-				for (k = 0; k < nparts; k++)
+				/*
+				 * If this partition is a leaf partition, then update its
+				 * subplan index.  The new index may have become -1 if the
+				 * subplan was pruned above, or it may have changed to a
+				 * lower value if some subplans earlier in the list were
+				 * being removed.
+				 */
+				if (oldidx >= 0)
 				{
-					int			oldidx = pprune->subplan_map[k];
-					int			subidx;
+					Assert(oldidx < nsubplans);
+					pprune->subplan_map[k] = new_subplan_indexes[oldidx];
 
-					/*
-					 * If this partition existed as a subplan then change the
-					 * old subplan index to the new subplan index.  The new
-					 * index may become -1 if the partition was pruned above,
-					 * or it may just come earlier in the subplan list due to
-					 * some subplans being removed earlier in the list.  If
-					 * it's a subpartition, add it to present_parts unless
-					 * it's entirely pruned.
-					 */
-					if (oldidx >= 0)
-					{
-						Assert(oldidx < nsubplans);
-						pprune->subplan_map[k] = new_subplan_indexes[oldidx];
+					/* Add to present_parts if the subplan wasn't pruned. */
+					if (new_subplan_indexes[oldidx] >= 0)
+						pprune->present_parts =
+							bms_add_member(pprune->present_parts, k);
+				}
+				/*
+				 * If this is a partitioned table, add to present_parts only
+				 * if at least one of its partitions survived pruning.
+				 */
+				else if ((subidx = pprune->subpart_map[k]) >= 0)
+				{
+					PartitionedRelPruningData *subprune;
 
-						if (new_subplan_indexes[oldidx] >= 0)
-							pprune->present_parts =
-								bms_add_member(pprune->present_parts, k);
-					}
-					else if ((subidx = pprune->subpart_map[k]) >= 0)
-					{
-						PartitionedRelPruningData *subprune;
+					subprune = &prunestate->partrelprunedata[subidx];
 
-						subprune = &prunedata->partrelprunedata[subidx];
-
-						if (!bms_is_empty(subprune->present_parts))
-							pprune->present_parts =
-								bms_add_member(pprune->present_parts, k);
-					}
+					if (!bms_is_empty(subprune->present_parts))
+						pprune->present_parts =
+							bms_add_member(pprune->present_parts, k);
 				}
 			}
 		}
+
 		pfree(new_subplan_indexes);
 	}
 
@@ -1777,18 +1759,26 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
 
 	for (i = 0; i < prunestate->num_partprunedata; i++)
 	{
-		PartitionPruningData *pprune;
 		PartitionedRelPruningData *prelprune;
 
-		pprune = prunestate->partprunedata[i];
-		prelprune = &pprune->partrelprunedata[0];
+		prelprune = &prunestate->partrelprunedata[i];
 
-		find_matching_subplans_recurse(pprune, prelprune, false, &result);
+		/*
+		 * Only call find_matching_subplans_recurse for the entries
+		 * corresponding to the topmost table of each partition hierarchy, as
+		 * the others are accessed recursively via
+		 * find_matching_subplans_recurse.
+		 */
+		if (!bms_is_member(i, prunestate->root_indexes))
+			continue;
+
+		find_matching_subplans_recurse(prunestate, prelprune, false, &result);
 
 		/* Expression eval may have used space in node's ps_ExprContext too */
 		ResetExprContext(prelprune->context.planstate->ps_ExprContext);
 	}
 
+
 	MemoryContextSwitchTo(oldcontext);
 
 	/* Copy result out of the temp context before we reset it */
@@ -1810,7 +1800,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
  * Adds valid (non-prunable) subplan IDs to *validsubplans
  */
 static void
-find_matching_subplans_recurse(PartitionPruningData *pprune,
+find_matching_subplans_recurse(PartitionPruneState *prunestate,
 							   PartitionedRelPruningData *prelprune,
 							   bool initial_prune,
 							   Bitmapset **validsubplans)
@@ -1854,8 +1844,8 @@ find_matching_subplans_recurse(PartitionPruningData *pprune,
 			int			partidx = prelprune->subpart_map[i];
 
 			if (partidx >= 0)
-				find_matching_subplans_recurse(pprune,
-											   &pprune->partrelprunedata[partidx],
+				find_matching_subplans_recurse(prunestate,
+											   &prunestate->partrelprunedata[partidx],
 											   initial_prune, validsubplans);
 			else
 			{
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 7c8220cf65..a06358b048 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -1183,6 +1183,7 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
 	PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
 
 	COPY_NODE_FIELD(prune_infos);
+	COPY_BITMAPSET_FIELD(root_indexes);
 	COPY_BITMAPSET_FIELD(other_subplans);
 
 	return newnode;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6269f474d2..391cd53dcf 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1018,6 +1018,7 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
 	WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
 
 	WRITE_NODE_FIELD(prune_infos);
+	WRITE_BITMAPSET_FIELD(root_indexes);
 	WRITE_BITMAPSET_FIELD(other_subplans);
 }
 
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3254524223..c565cfad92 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -2330,6 +2330,7 @@ _readPartitionPruneInfo(void)
 	READ_LOCALS(PartitionPruneInfo);
 
 	READ_NODE_FIELD(prune_infos);
+	READ_BITMAPSET_FIELD(root_indexes);
 	READ_BITMAPSET_FIELD(other_subplans);
 
 	READ_DONE();
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f9e6ad3ab7..c7872661c4 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1033,6 +1033,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	ListCell   *subpaths;
 	RelOptInfo *rel = best_path->path.parent;
 	PartitionPruneInfo *partpruneinfo = NULL;
+	List	   *flattened_partitioned_rels = NIL;
 
 	/*
 	 * The subpaths list could be empty, if every child was proven empty by
@@ -1083,6 +1084,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 
 		prunequal = extract_actual_clauses(rel->baserestrictinfo, false);
 
+		flattened_partitioned_rels =
+					flatten_partitioned_rels(best_path->partitioned_rels);
+
 		if (best_path->path.param_info)
 		{
 			List	   *prmquals = best_path->path.param_info->ppi_clauses;
@@ -1098,6 +1102,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 			partpruneinfo =
 				make_partition_pruneinfo(root, rel,
 										 best_path->partitioned_rels,
+										 flattened_partitioned_rels,
 										 best_path->subpaths, prunequal);
 	}
 
@@ -1109,7 +1114,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	 */
 
 	plan = make_append(subplans, best_path->first_partial_path,
-					   tlist, best_path->partitioned_rels,
+					   tlist, flattened_partitioned_rels,
 					   partpruneinfo);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1135,6 +1140,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
 	ListCell   *subpaths;
 	RelOptInfo *rel = best_path->path.parent;
 	PartitionPruneInfo *partpruneinfo = NULL;
+	List	   *flattened_partitioned_rels = NIL;
 
 	/*
 	 * We don't have the actual creation of the MergeAppend node split out
@@ -1233,6 +1239,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
 
 		prunequal = extract_actual_clauses(rel->baserestrictinfo, false);
 
+		flattened_partitioned_rels =
+					flatten_partitioned_rels(best_path->partitioned_rels);
+
 		if (best_path->path.param_info)
 		{
 
@@ -1247,12 +1256,12 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
 
 		if (prunequal != NIL)
 			partpruneinfo = make_partition_pruneinfo(root, rel,
-													  best_path->partitioned_rels,
-													  best_path->subpaths, prunequal);
+													 best_path->partitioned_rels,
+													 flattened_partitioned_rels,
+													 best_path->subpaths, prunequal);
 	}
 
-	node->partitioned_rels =
-		flatten_partitioned_rels(best_path->partitioned_rels);
+	node->partitioned_rels = flattened_partitioned_rels;
 	node->mergeplans = subplans;
 	node->part_prune_info = partpruneinfo;
 
@@ -5006,7 +5015,7 @@ bitmap_subplan_mark_shared(Plan *plan)
 /*
  * flatten_partitioned_rels
  *		Convert List of Lists into a single List with all elements from the
-*		sub-lists.
+ *		sub-lists.
  */
 static List *
 flatten_partitioned_rels(List *partitioned_rels)
@@ -5380,8 +5389,9 @@ make_append(List *appendplans, int first_partial_plan,
 	plan->righttree = NULL;
 	node->appendplans = appendplans;
 	node->first_partial_plan = first_partial_plan;
-	node->partitioned_rels = flatten_partitioned_rels(partitioned_rels);
+	node->partitioned_rels = partitioned_rels;
 	node->part_prune_info = partpruneinfo;
+
 	return node;
 }
 
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9ce216c28b..ba06ff7119 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -114,6 +114,7 @@ typedef struct PruneStepResult
 static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
 							  RelOptInfo *parentrel,
 							  int *relid_subplan_map,
+							  int *relid_subpart_map,
 							  List *partitioned_rels, List *prunequal,
 							  Bitmapset **matchedsubplans);
 static List *gen_partprune_steps(RelOptInfo *rel, List *clauses,
@@ -195,12 +196,16 @@ static bool partkey_datum_from_expr(PartitionPruneContext *context,
  */
 PartitionPruneInfo *
 make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
-						 List *partitioned_rels, List *subpaths,
+						 List *partitioned_rels,
+						 List *flattened_partitioned_rels,
+						 List *subpaths,
 						 List *prunequal)
 {
 	PartitionPruneInfo *pruneinfo;
 	Bitmapset  *allmatchedsubplans = NULL;
+	Bitmapset  *root_indexes = NULL;
 	int		   *relid_subplan_map;
+	int		   *relid_subpart_map;
 	ListCell   *lc;
 	List	   *prunerelinfos;
 	int			i;
@@ -230,6 +235,38 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
 		relid_subplan_map[pathrel->relid] = i++;
 	}
 
+	/*
+	 * Construct a temporary array to map from planner relids to index of the
+	 * partitioned_rel.  For convenience, we use 1-based indexes here, so that
+	 * zero can represent an un-filled array entry.
+	 *
+	 * Also, since we're going to flatten the list before putting it into the
+	 * plan, use indexes into the flattened list in the mapping arrays of
+	 * resulting PartitionedRelPruneInfo nodes, instead of indexes into
+	 * individual sub-lists.
+	 */
+	relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size);
+
+	/*
+	 * relid_subpart_map maps relid of a non-leaf partition to the index in
+	 * 'partitioned_rels' of that rel (which will also be the index in the
+	 * returned PartitionedRelPruneInfo list of the info for that partition).
+	 */
+	i = 1;
+	foreach(lc, flattened_partitioned_rels)
+	{
+		Index		rti = lfirst_int(lc);
+
+		Assert(rti < root->simple_rel_array_size);
+		/* No duplicates please */
+		Assert(relid_subpart_map[rti] == 0);
+		/* Same rel cannot be both leaf and non-leaf */
+		Assert(relid_subplan_map[rti] == 0);
+
+		relid_subpart_map[rti] = i++;
+	}
+
+
 	Assert(partitioned_rels->type == T_List);
 
 	prunerelinfos = NIL;
@@ -240,22 +277,29 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
 		List	   *rels = lfirst(lc);
 		List	   *prelinfolist;
 		Bitmapset  *matchedsubplans = NULL;
+		Index		root_rt_index = linitial_int(rels);
 
 		prelinfolist = make_partitionedrel_pruneinfo(root, parentrel,
 													 relid_subplan_map,
+													 relid_subpart_map,
 													 rels, prunequal,
 													 &matchedsubplans);
 
 		/* When pruning is possible, record the matched subplans */
 		if (prelinfolist != NIL)
 		{
-			prunerelinfos = lappend(prunerelinfos, prelinfolist);
+			prunerelinfos = list_concat(prunerelinfos,
+										list_copy(prelinfolist));
 			allmatchedsubplans = bms_join(matchedsubplans,
 										  allmatchedsubplans);
+			root_indexes =
+						bms_add_member(root_indexes,
+									   relid_subpart_map[root_rt_index] - 1);
 		}
 	}
 
 	pfree(relid_subplan_map);
+	pfree(relid_subpart_map);
 
 	/*
 	 * if none of the partition hierarchies had any useful run-time pruning
@@ -287,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
 	else
 		pruneinfo->other_subplans = NULL;
 
+	/* There should be at least one member. */
+	Assert(bms_num_members(root_indexes) > 0);
+	pruneinfo->root_indexes = root_indexes;
+
 	return pruneinfo;
 }
 
@@ -310,45 +358,17 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
  */
 static List *
 make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
-							  int *relid_subplan_map,
+							  int *relid_subplan_map, int *relid_subpart_map,
 							  List *partitioned_rels, List *prunequal,
 							  Bitmapset **matchedsubplans)
 {
 	RelOptInfo *targetpart = NULL;
 	List	   *prelinfolist = NIL;
 	bool		doruntimeprune = false;
-	bool		hascontradictingquals = false;
 	ListCell   *lc;
-	int		   *relid_subpart_map;
 	Bitmapset  *subplansfound = NULL;
 	int			i;
 
-	/*
-	 * Construct a temporary array to map from planner relids to index of the
-	 * partitioned_rel.  For convenience, we use 1-based indexes here, so that
-	 * zero can represent an un-filled array entry.
-	 */
-	relid_subpart_map = palloc0(sizeof(int) * root->simple_rel_array_size);
-
-	/*
-	 * relid_subpart_map maps relid of a non-leaf partition to the index in
-	 * 'partitioned_rels' of that rel (which will also be the index in the
-	 * returned PartitionedRelPruneInfo list of the info for that partition).
-	 */
-	i = 1;
-	foreach(lc, partitioned_rels)
-	{
-		Index		rti = lfirst_int(lc);
-
-		Assert(rti < root->simple_rel_array_size);
-		/* No duplicates please */
-		Assert(relid_subpart_map[rti] == 0);
-		/* Same rel cannot be both leaf and non-leaf */
-		Assert(relid_subplan_map[rti] == 0);
-
-		relid_subpart_map[rti] = i++;
-	}
-
 	/* We now build a PartitionedRelPruneInfo for each partitioned rel */
 	foreach(lc, partitioned_rels)
 	{
@@ -477,8 +497,6 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
 		prelinfolist = lappend(prelinfolist, prelinfo);
 	}
 
-	pfree(relid_subpart_map);
-
 	if (!doruntimeprune)
 		return NIL;
 
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 4327fd4cb1..46f23f45de 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -147,17 +147,6 @@ typedef struct PartitionedRelPruningData
 	bool		do_exec_prune;
 } PartitionedRelPruningData;
 
-/*
- * PartitionPruningData - Encapsulates an array of PartitionedRelPruningData
- * which belong to a single partition hierarchy containing 1 or more
- * partitions.
- */
-typedef struct PartitionPruningData
-{
-	int			num_partrelprunedata;
-	PartitionedRelPruningData partrelprunedata[FLEXIBLE_ARRAY_MEMBER];
-} PartitionPruningData;
-
 /*-----------------------
  * PartitionPruneState - State object required for plan nodes to perform
  * run-time partition pruning.
@@ -185,9 +174,16 @@ typedef struct PartitionPruningData
  *						These must not be pruned.
  * prune_context		A short-lived memory context in which to execute the
  *						partition pruning functions.
- * partprunedata		Array of PartitionPruningData pointers for the plan's
- *						partitioned relation, ordered such that parent tables
- *						appear before children (hence, topmost table is first).
+ * root_indexes			Contains indexes of PartitionedRelPruningData in the
+ *						array below ('partprunedata') of the topmost
+ *						partitioned tables of each partition hierarchy
+ * partprunedata		Array of pointers of PartitionedRelPruningData structs
+ *						of partitioned relations contained in the plan,
+ *						ordered such that parent tables appear before children
+ *						(hence, the topmost table always appears first in the
+ *						sequence of PartitionedRelPruningData's of partitioned
+ *						tables in a given partition hieratchy and its index
+ *						is contained in 'root_indexes' as mentioned above).
  *-----------------------
  */
 typedef struct PartitionPruneState
@@ -198,7 +194,8 @@ typedef struct PartitionPruneState
 	Bitmapset  *execparamids;
 	Bitmapset  *other_subplans;
 	MemoryContext prune_context;
-	PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
+	Bitmapset  *root_indexes;
+	PartitionedRelPruningData partrelprunedata[FLEXIBLE_ARRAY_MEMBER];
 } PartitionPruneState;
 
 extern PartitionTupleRouting *ExecSetupPartitionTupleRouting(ModifyTableState *mtstate,
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a1a782d2f6..c057a5fc33 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1059,7 +1059,10 @@ typedef struct PlanRowMark
  * PartitionPruneInfo-  - Details required to allow the executor to prune
  * partitions.
  *
- * prune_infos			List of Lists containing PartitionedRelPruneInfo
+ * prune_infos			List of PartitionedRelPruneInfo's
+ * root_indexes			Indexes of PartitionedRelPruneInfo's in 'prune_infos'
+ *						of the topmost partitioned tables in partition
+ *						hierarchies contained in the plan
  * other_subplans		Indexes of any subplans which are not accounted for
  *						by any of the PartitionedRelPruneInfo stored in
  *						'prune_infos'.
@@ -1068,6 +1071,7 @@ typedef struct PartitionPruneInfo
 {
 	NodeTag		type;
 	List	   *prune_infos;
+	Bitmapset  *root_indexes;
 	Bitmapset  *other_subplans;
 } PartitionPruneInfo;
 
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index df3bcb737d..79398d1cc1 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -77,6 +77,7 @@ typedef struct PartitionPruneContext
 extern PartitionPruneInfo *make_partition_pruneinfo(PlannerInfo *root,
 						 RelOptInfo *parentrel,
 						 List *partitioned_rels,
+						 List *flattened_partitioned_rels,
 						 List *subpaths, List *prunequal);
 extern Relids prune_append_rel_partitions(RelOptInfo *rel);
 extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
v3-0001-other_subplans-bug-delta.patchtext/plain; charset=UTF-8; name=v3-0001-other_subplans-bug-delta.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index dac789d414..b5f796f5ed 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1669,6 +1669,19 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
 				new_subplan_indexes[i] = newidx++;
 			else
 				new_subplan_indexes[i] = -1;	/* Newly pruned */
+
+			/*
+			 * If a subplan in other_subplans got its index updated, update
+			 * other_subplans too.
+			 */
+			if (bms_is_member(i, prunestate->other_subplans))
+			{
+				prunestate->other_subplans =
+							bms_del_member(prunestate->other_subplans, i);
+				prunestate->other_subplans =
+							bms_add_member(prunestate->other_subplans,
+										   new_subplan_indexes[i]);
+			}
 		}
 
 		/*
#35David Rowley
david.rowley@2ndquadrant.com
In reply to: David Rowley (#30)
1 attachment(s)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 20 July 2018 at 01:03, David Rowley <david.rowley@2ndquadrant.com> wrote:

I've attached a patch intended for master which is just v2 based on
post 5220bb7533.

In [1]/messages/by-id/CAKJS1f9e7hHYP9SaT=-_RR4jdmm9VCgtDoC3-60s97EMjcfGWg@mail.gmail.com I mentioned that I think that bug should be fixed as part of
this bug fix too. It just seems a little strange to fix that one
separately when without the v3 patch for this fix the code still won't
work correctly when any subplans are present which don't belong in the
partition hierarchy.

I've attached a patch which can be applied on top of the v3 patch.

[1]: /messages/by-id/CAKJS1f9e7hHYP9SaT=-_RR4jdmm9VCgtDoC3-60s97EMjcfGWg@mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

remove_incorrect_partprune_assert.patchapplication/octet-stream; name=remove_incorrect_partprune_assert.patchDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 81dca39f6d..008f5b96c0 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -343,8 +343,6 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
 		Assert(rti < root->simple_rel_array_size);
 		/* No duplicates please */
 		Assert(relid_subpart_map[rti] == 0);
-		/* Same rel cannot be both leaf and non-leaf */
-		Assert(relid_subplan_map[rti] == 0);
 
 		relid_subpart_map[rti] = i++;
 	}
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#34)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

[ getting back to this thread at last ]

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

On 2018/07/21 0:17, David Rowley wrote:

You could work around that by having some array that points to the
target partitioned table of each hierarchy, but I don't see why that's
better than having the additional struct.

Or it could be a Bitmapset called root_indexes that stores the offset of
the first Index value in every partitioned_rels list contained in turn in
the list that's passed to make_partition_pruneinfo.

I'm unimpressed with this solution --- that's just another independent
data structure that we'd have to keep in sync with the main one. (For
instance, if we ever added/removed PartitionedRelPruningData structs in
the list, we'd have to renumber that bitmapset's bits.) If we wanted to
go that way, it would make much more sense to add an "is_root" boolean
field to the PartitionedRelPruningData structs. However, I tend to agree
with David that flattening the partitioning struct tree isn't actually a
worthy goal to pursue. First, I don't see that it buys us much, and
second, I'm afraid we'll just end up undoing it or else adding annotations
that are morally equivalent to having the nested structure.

Also, I noticed a bug with how ExecFindInitialMatchingSubPlans handles
other_subplans. While the indexes in subplan_map arrays are updated to
contain revised values after pruning, those in the other_subplans
Bitmapset are not, leading to crashes or possibly wrong result.

This is actually a lovely example of why I dislike having a bunch of
auxiliary bitmapsets (or lists of ints) dangling off the plan tree.
They're maintenance headaches. I would rather fix this problem by
not having other_subplans in the first place. Or maybe we should get
rid of the subplan-renumbering business: that looks like bugs waiting to
happen, IMO, and I'm really unconvinced that it buys us anything that's
worth the overhead of doing it.

Off to study David's last patch in more detail.

regards, tom lane

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#35)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

David Rowley <david.rowley@2ndquadrant.com> writes:

On 20 July 2018 at 01:03, David Rowley <david.rowley@2ndquadrant.com> wrote:

I've attached a patch intended for master which is just v2 based on
post 5220bb7533.

I've pushed the v3 patch with a lot of editorial work (e.g. cleaning
up comments you hadn't). I still want to think about getting rid of
some of the "extraneous" bitmapsets and lists that are running around
here ... but time grows short before beta3, and it's not clear that
that would be appropriate material to push into v11 anyway.

In [1] I mentioned that I think that bug should be fixed as part of
this bug fix too.

I didn't include this change because (a) it's late, (b) no test
case was included, and (c) I don't entirely believe it anyway.
How would a rel be both leaf and nonleaf? Isn't this indicative
of a problem somewhere upstream in the planner?

regards, tom lane

#38David Rowley
david.rowley@2ndquadrant.com
In reply to: Tom Lane (#37)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2 August 2018 at 11:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've pushed the v3 patch with a lot of editorial work (e.g. cleaning
up comments you hadn't).

Thanks for doing that.

In [1] I mentioned that I think that bug should be fixed as part of
this bug fix too.

I didn't include this change because (a) it's late, (b) no test
case was included, and (c) I don't entirely believe it anyway.
How would a rel be both leaf and nonleaf? Isn't this indicative
of a problem somewhere upstream in the planner?

It's probably best discussed on the other thread, but it seems to be
by design in accumulate_append_subpath(). Parallel Append nodes don't
get flattened if they contain a mix of parallel aware and non-parallel
aware subplans.

So you might be right, maybe a better option is to have that code
reorder the subplans so that the parallel aware ones stay at the end.
I'm just not convinced that fixing that code means it would mean it
would never happen again. It does not seem too outrageous to me to
support nested Appends, and with those, what else would the Path's
parent RelOptInfo be if it's not the partitioned table that the node's
subpaths belong to? Or maybe, in that case, the partitioned_rels
belonging to the sub-Append should not have been included in the List
for the top-level Append.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#39Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#37)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2018-Aug-01, Tom Lane wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:

On 20 July 2018 at 01:03, David Rowley <david.rowley@2ndquadrant.com> wrote:

I've attached a patch intended for master which is just v2 based on
post 5220bb7533.

I've pushed the v3 patch with a lot of editorial work (e.g. cleaning
up comments you hadn't).

Thanks Tom, much appreciated.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#40Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Alvaro Herrera (#39)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

Hi,

Consider the below case:

CREATE TABLE pt (a INT, b INT, c INT) PARTITION BY RANGE(a);
CREATE TABLE pt_p1 PARTITION OF pt FOR VALUES FROM (1) to (6) PARTITION BY
RANGE (b);
CREATE TABLE pt_p1_p1 PARTITION OF pt_p1 FOR VALUES FROM (11) to (44);
CREATE TABLE pt_p1_p2 PARTITION OF pt_p1 FOR VALUES FROM (44) to (66);
INSERT INTO pt (a,b,c) VALUES
(1,11,111),(2,22,222),(3,33,333),(4,44,444),(5,55,555);

-- rule on root partition to first level child,
CREATE RULE pt_rule_ptp1 AS ON UPDATE TO pt DO INSTEAD UPDATE pt_p1 SET a =
new.a WHERE a = old.a;

-- Below command end up with error
UPDATE pt SET a = 3 WHERE a = 2;
ERROR: child rel 1 not found in append_rel_array

Here update on the partition table fail, if it has rule which is define on
partition table - to redirect record on the child table.

While looking further, I found the test started failing with below commit:

commit 1b54e91faabf3764b6786915881e514e42dccf89
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed Aug 1 19:42:46 2018 -0400

Fix run-time partition pruning for appends with multiple source rels.

Error coming from below code, where its try to adjust the appendrel
attribute and end up with error from find_appinfos_by_relids().

/*
* The prunequal is presented to us as a qual for 'parentrel'.
* Frequently this rel is the same as targetpart, so we can skip
* an adjust_appendrel_attrs step. But it might not be, and
then
* we have to translate. We update the prunequal parameter
here,
* because in later iterations of the loop for child partitions,
* we want to translate from parent to child variables.
*/
if (parentrel != subpart)
{
int nappinfos;
AppendRelInfo **appinfos = find_appinfos_by_relids(root,

subpart->relids,

&nappinfos);

prunequal = (List *) adjust_appendrel_attrs(root, (Node *)
prunequal,
nappinfos,
appinfos);

pfree(appinfos);
}

Regards,

On Thu, Aug 2, 2018 at 8:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

On 2018-Aug-01, Tom Lane wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:

On 20 July 2018 at 01:03, David Rowley <david.rowley@2ndquadrant.com>

wrote:

I've attached a patch intended for master which is just v2 based on
post 5220bb7533.

I've pushed the v3 patch with a lot of editorial work (e.g. cleaning
up comments you hadn't).

Thanks Tom, much appreciated.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Rushabh Lathia
www.EnterpriseDB.com

#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rushabh Lathia (#40)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

Rushabh Lathia <rushabh.lathia@gmail.com> writes:

Consider the below case:

I initially thought the rule might be messing stuff up, but you can get
the same result without the rule by writing out the transformed query
by hand:

regression=# explain UPDATE pt_p1 SET a = 3 from pt
WHERE pt.a = 2 and pt.a = pt_p1.a;
ERROR: child rel 2 not found in append_rel_array

With enable_partition_pruning=off this goes through without an error.

I suspect the join pruning stuff is getting confused by the overlap
between the two partitioning trees involved in the join; although the
fact that one of them is the target rel must be related too, because
if you just write a SELECT for this join it's fine.

I rather doubt that this case worked before 1b54e91fa ... no time
to look closer today, though.

regards, tom lane

#42Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#41)
1 attachment(s)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2018/08/08 8:09, Tom Lane wrote:

Rushabh Lathia <rushabh.lathia@gmail.com> writes:

Consider the below case:

I initially thought the rule might be messing stuff up, but you can get
the same result without the rule by writing out the transformed query
by hand:

regression=# explain UPDATE pt_p1 SET a = 3 from pt
WHERE pt.a = 2 and pt.a = pt_p1.a;
ERROR: child rel 2 not found in append_rel_array

With enable_partition_pruning=off this goes through without an error.

I suspect the join pruning stuff is getting confused by the overlap
between the two partitioning trees involved in the join; although the
fact that one of them is the target rel must be related too, because
if you just write a SELECT for this join it's fine.

I rather doubt that this case worked before 1b54e91fa ... no time
to look closer today, though.

The code pointed out by Rushabh indeed seems to be the culprit in this case.

/*
* The prunequal is presented to us as a qual for 'parentrel'.
* Frequently this rel is the same as targetpart, so we can skip
* an adjust_appendrel_attrs step. But it might not be, and then
* we have to translate. We update the prunequal parameter here,
* because in later iterations of the loop for child partitions,
* we want to translate from parent to child variables.
*/
if (parentrel != subpart)
{
int nappinfos;
AppendRelInfo **appinfos = find_appinfos_by_relids(root,
subpart->relids, &nappinfos);

prunequal = (List *) adjust_appendrel_attrs(root, (Node *)
prunequal,
nappinfos,
appinfos);
pfree(appinfos);
}

This code is looking for the case where we have to translate prunequal's
Vars from UNION ALL parent varno to actual partitioned parent's varno, but
the detection test (if (parentrel != subpart)) turns out to be unreliable,
as shown by this report. I think the test should be if (parent->relid !=
subpart->relid). Comparing pointers as is done now is fine without
inheritance_planner being involved, but not when it *is* involved, because
of the following piece of code in it that overwrites RelOptInfos:

/*
* We need to collect all the RelOptInfos from all child plans into
* the main PlannerInfo, since setrefs.c will need them. We use the
* last child's simple_rel_array (previous ones are too short), so we
* have to propagate forward the RelOptInfos that were already built
* in previous children.
*/
Assert(subroot->simple_rel_array_size >= save_rel_array_size);
for (rti = 1; rti < save_rel_array_size; rti++)
{
RelOptInfo *brel = save_rel_array[rti];

if (brel)
subroot->simple_rel_array[rti] = brel;
}
save_rel_array_size = subroot->simple_rel_array_size;
save_rel_array = subroot->simple_rel_array;
save_append_rel_array = subroot->append_rel_array;

With this, the RelOptInfos that would've been used to create Paths that
are currently under the subroot's final rel's best path, would no longer
be accessible through that subroot, because they're overwritten by the
corresponding ones in save_rel_array. Subsequently,
create_modifytable_plan() passes the 'subroot' to create_plan_recurse()
that will recurse down to make_parttionedrel_pruneinfo() via
create_append_plan(). The 'parentrel' RelOptInfo that's fetched off of
AppendPath is no longer reachable from 'subroot', because it's been
overwritten as mentioned above. 'subpart', the RelOptInfo (of the same RT
index) fetched from 'subroot' is thus not the same as 'parentrel'. So,
the if (parentrel != subpart) test is mistakenly satisfied, leading to the
failure of finding the needed AppendRelInfo, which makes sense, because
'subpart' is not really a child of anything.

Attached is a patch which modifies the if test to compare relids instead
of RelOptInfo pointers.

Thanks,
Amit

Attachments:

partprune-1c2cb2744-thinko.patchtext/plain; charset=UTF-8; name=partprune-1c2cb2744-thinko.patchDownload
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 752810d0e4..67f0dc5e59 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -384,7 +384,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
 			 * because in later iterations of the loop for child partitions,
 			 * we want to translate from parent to child variables.
 			 */
-			if (parentrel != subpart)
+			if (parentrel->relid != subpart->relid)
 			{
 				int			nappinfos;
 				AppendRelInfo **appinfos = find_appinfos_by_relids(root,
#43David Rowley
david.rowley@2ndquadrant.com
In reply to: Amit Langote (#42)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 8 August 2018 at 17:28, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

Attached is a patch which modifies the if test to compare relids instead
of RelOptInfo pointers.

Thanks for investigating and writing a patch. I agree with the fix.

It's probably worth writing a test that performs run-time pruning from
an inheritance planner plan. Do you want to include that in your
patch? If not, I can.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#43)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

David Rowley <david.rowley@2ndquadrant.com> writes:

On 8 August 2018 at 17:28, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

Attached is a patch which modifies the if test to compare relids instead
of RelOptInfo pointers.

Thanks for investigating and writing a patch. I agree with the fix.

I changed this to compare the relid sets not just rel->relid, since
rel->relid is only reliable for baserels. The partitioned rel could
safely be assumed to be a baserel, but I'm less comfortable with
supposing that the parentrel always will be. Otherwise, added a
test case based on Rushabh's example and pushed. (I'm not quite
sure if the plan will be stable enough to satisfy the buildfarm,
but we'll soon find out ...)

regards, tom lane

#45Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#44)
1 attachment(s)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2018/08/09 0:48, Tom Lane wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:

On 8 August 2018 at 17:28, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

Attached is a patch which modifies the if test to compare relids instead
of RelOptInfo pointers.

Thanks for investigating and writing a patch. I agree with the fix.

I changed this to compare the relid sets not just rel->relid, since
rel->relid is only reliable for baserels. The partitioned rel could
safely be assumed to be a baserel, but I'm less comfortable with
supposing that the parentrel always will be. Otherwise, added a
test case based on Rushabh's example and pushed. (I'm not quite
sure if the plan will be stable enough to satisfy the buildfarm,
but we'll soon find out ...)

Thank you for committing, agreed that comparing relid sets for equality
might be more future-proof.

About the test case, wondering if we should, like David seemed to suggest,
add a test case that would actually use run-time pruning? Maybe, even
better if the new test also had partitioned parent under UNION ALL parent
under ModifyTable. Something like in the attached?

One reason why we should adapt such a test case is that, in the future, we
may arrange for make_partitionedrel_pruneinfo(), whose code we just fixed,
to not be called if we know that run-time pruning is not needed. It seems
that that's true for the test added by the commit, that is, it doesn't
need run-time pruning.

Regards,
Amit

Attachments:

11e22e486-additional-test.patctext/plain; charset=UTF-8; name=11e22e486-additional-test.patcDownload
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 693c348185..61457862a9 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2478,6 +2478,8 @@ deallocate ab_q3;
 deallocate ab_q4;
 deallocate ab_q5;
 -- UPDATE on a partition subtree has been seen to have problems.
+set enable_hashjoin to off;
+set enable_mergejoin to off;
 insert into ab values (1,2);
 explain (analyze, costs off, summary off, timing off)
 update ab_a1 set b = 3 from ab where ab.a = 1 and ab.a = ab_a1.a;
@@ -2556,6 +2558,69 @@ table ab;
  1 | 3
 (1 row)
 
+truncate ab;
+insert into ab values (1, 1), (1, 2), (1, 3);
+explain (analyze, costs off, summary off, timing off)
+update ab_a1 set b = 3 from (select * from (select * from ab_a2 union all select 1, 2) s where s.b = (select 2)) ss where ss.a = ab_a1.a;
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
+ Update on ab_a1 (actual rows=0 loops=1)
+   Update on ab_a1_b1
+   Update on ab_a1_b2
+   Update on ab_a1_b3
+   InitPlan 1 (returns $0)
+     ->  Result (actual rows=1 loops=1)
+   ->  Nested Loop (actual rows=1 loops=1)
+         ->  Append (actual rows=1 loops=1)
+               ->  Seq Scan on ab_a2_b1 (never executed)
+                     Filter: (b = $0)
+               ->  Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
+                     Filter: (b = $0)
+               ->  Seq Scan on ab_a2_b3 (never executed)
+                     Filter: (b = $0)
+               ->  Subquery Scan on "*SELECT* 2" (actual rows=1 loops=1)
+                     ->  Result (actual rows=1 loops=1)
+                           One-Time Filter: (2 = $0)
+         ->  Index Scan using ab_a1_b1_a_idx on ab_a1_b1 (actual rows=1 loops=1)
+               Index Cond: (a = ab_a2_b1.a)
+   ->  Nested Loop (actual rows=1 loops=1)
+         ->  Append (actual rows=1 loops=1)
+               ->  Seq Scan on ab_a2_b1 (never executed)
+                     Filter: (b = $0)
+               ->  Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
+                     Filter: (b = $0)
+               ->  Seq Scan on ab_a2_b3 (never executed)
+                     Filter: (b = $0)
+               ->  Subquery Scan on "*SELECT* 2_1" (actual rows=1 loops=1)
+                     ->  Result (actual rows=1 loops=1)
+                           One-Time Filter: (2 = $0)
+         ->  Index Scan using ab_a1_b2_a_idx on ab_a1_b2 (actual rows=1 loops=1)
+               Index Cond: (a = ab_a2_b1.a)
+   ->  Nested Loop (actual rows=1 loops=1)
+         ->  Append (actual rows=1 loops=1)
+               ->  Seq Scan on ab_a2_b1 (never executed)
+                     Filter: (b = $0)
+               ->  Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
+                     Filter: (b = $0)
+               ->  Seq Scan on ab_a2_b3 (never executed)
+                     Filter: (b = $0)
+               ->  Subquery Scan on "*SELECT* 2_2" (actual rows=1 loops=1)
+                     ->  Result (actual rows=1 loops=1)
+                           One-Time Filter: (2 = $0)
+         ->  Index Scan using ab_a1_b3_a_idx on ab_a1_b3 (actual rows=1 loops=1)
+               Index Cond: (a = ab_a2_b1.a)
+(45 rows)
+
+table ab;
+ a | b 
+---+---
+ 1 | 3
+ 1 | 3
+ 1 | 3
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
 drop table ab, lprt_a;
 -- Join
 create table tbl1(col1 int);
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 935c509b29..e78220a3e5 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -555,10 +555,19 @@ deallocate ab_q4;
 deallocate ab_q5;
 
 -- UPDATE on a partition subtree has been seen to have problems.
+set enable_hashjoin to off;
+set enable_mergejoin to off;
 insert into ab values (1,2);
 explain (analyze, costs off, summary off, timing off)
 update ab_a1 set b = 3 from ab where ab.a = 1 and ab.a = ab_a1.a;
 table ab;
+truncate ab;
+insert into ab values (1, 1), (1, 2), (1, 3);
+explain (analyze, costs off, summary off, timing off)
+update ab_a1 set b = 3 from (select * from (select * from ab_a2 union all select 1, 2) s where s.b = (select 2)) ss where ss.a = ab_a1.a;
+table ab;
+reset enable_hashjoin;
+reset enable_mergejoin;
 
 drop table ab, lprt_a;
 
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#45)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

One reason why we should adapt such a test case is that, in the future, we
may arrange for make_partitionedrel_pruneinfo(), whose code we just fixed,
to not be called if we know that run-time pruning is not needed. It seems
that that's true for the test added by the commit, that is, it doesn't
need run-time pruning.

Not following your argument here. Isn't make_partition_pruneinfo
precisely what is in charge of figuring out whether run-time pruning
is possible?

(See my point in the other thread about Jaime's assertion crash,
that no run-time pruning actually would be possible for that query.
But we got to the assertion failure anyway.)

regards, tom lane

#47Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#46)
Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2018/08/09 13:00, Tom Lane wrote:

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

One reason why we should adapt such a test case is that, in the future, we
may arrange for make_partitionedrel_pruneinfo(), whose code we just fixed,
to not be called if we know that run-time pruning is not needed. It seems
that that's true for the test added by the commit, that is, it doesn't
need run-time pruning.

Not following your argument here. Isn't make_partition_pruneinfo
precisely what is in charge of figuring out whether run-time pruning
is possible?

With the current coding, yes, it is...

(See my point in the other thread about Jaime's assertion crash,
that no run-time pruning actually would be possible for that query.
But we got to the assertion failure anyway.)

The first time I'd seen that make_partition_pruneinfo *always* gets called
from create_append_plan if rel->baserestrictinfo is non-NIL, I had
wondered whether we couldn't avoid doing it for the cases for which we'll
end up throwing away all that work anyway. But looking at the code now,
it may be a bit hard -- analyze_partkey_exprs(), which determines whether
we'll need any execution-time pruning, could not be called any sooner.

So, okay, I have to admit that my quoted argument isn't that strong.

Thanks,
Amit

#48Phil Florent
philflorent@hotmail.com
In reply to: Amit Langote (#47)
RE: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

Hi,
Thanks for your work, our prototype runs OK. PostgreSQL 11 and its now fully functional partitioning feature is our validated choice to replace a well-known proprietary RDBMS in 100+ public hospitals for our dss application.
Best regards
Phil

________________________________
De : Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
Envoyé : jeudi 9 août 2018 06:35
À : Tom Lane
Cc : David Rowley; Rushabh Lathia; Alvaro Herrera; Robert Haas; Phil Florent; PostgreSQL Hackers
Objet : Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

On 2018/08/09 13:00, Tom Lane wrote:

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:

One reason why we should adapt such a test case is that, in the future, we
may arrange for make_partitionedrel_pruneinfo(), whose code we just fixed,
to not be called if we know that run-time pruning is not needed. It seems
that that's true for the test added by the commit, that is, it doesn't
need run-time pruning.

Not following your argument here. Isn't make_partition_pruneinfo
precisely what is in charge of figuring out whether run-time pruning
is possible?

With the current coding, yes, it is...

(See my point in the other thread about Jaime's assertion crash,
that no run-time pruning actually would be possible for that query.
But we got to the assertion failure anyway.)

The first time I'd seen that make_partition_pruneinfo *always* gets called
from create_append_plan if rel->baserestrictinfo is non-NIL, I had
wondered whether we couldn't avoid doing it for the cases for which we'll
end up throwing away all that work anyway. But looking at the code now,
it may be a bit hard -- analyze_partkey_exprs(), which determines whether
we'll need any execution-time pruning, could not be called any sooner.

So, okay, I have to admit that my quoted argument isn't that strong.

Thanks,
Amit