[Proposal] Make the optimiser aware of partitions ordering
Hello,
With native partitioning landing in Postgres 10, we (Julien Rouhaud and
myself) had the idea for the
following simple optimisation. This simple optimisation comes from a real use
case.
===== Proposal =====
With range partitioning, we guarantee that each partition contains non-
overlapping values. Since we know the range allowed for each partition, it is
possible to sort them according to the partition key (as is done already for
looking up partitions).
Thus, we ca generate sorted Append plans instead of MergeAppend when sorting
occurs on the partition key.
For example, consider the following model and query:
CREATE TABLE parent (id int) PARTITION BY RANGE (id);
CREATE TABLE child2 PARTITION OF parent FOR VALUES FROM (10) TO (20);
CREATE TABLE child1 PARTITION OF parent FOR VALUES FROM (1) TO (10);
CREATE INDEX ON child1(id);
CREATE INDEX ON child2(id);
EXPLAIN (COSTS OFF) SELECT * FROM parent ORDER BY id desc;
QUERY PLAN
--------------------------------------------------------------
Merge Append
Sort Key: parent.id DESC
-> Sort
Sort Key: parent.id DESC
-> Seq Scan on parent
-> Index Only Scan Backward using child2_id_idx on child2
-> Index Only Scan Backward using child1_id_idx on child
We can guarantee that every value found in child2 will have a greater id than
any value found in child1. Thus, we could generate a plan like this:
QUERY PLAN
---------------------------------------------------------------
Append
Sort Key: child2.id DESC
-> Index Only Scan Backward using child2_id_idx1 on child2
-> Index Only Scan Backward using child1_id_idx1 on child1
Skipping the merge step altogether.
This has the added benefit of providing an "early stop": if we add a limit to
the query, we will only scan the indexes that are necessary for fetching the
required amount of tuples. This is especially useful if we add a WHERE clause
not covered by the index with the limit. Consider the following example, with
a table "webvisits" partitioned by a ts colum. If we want to retrieve the
latest 100 hits from a specific ip:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM webvisits WHERE ipaddr =
'93.184.216.34' ORDER BY ts DESC LIMIT 10;
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..32.32 rows=10 width=104) (actual time=0.080..0.109 rows=10
loops=1)
Buffers: shared hit=7
-> Append (cost=0.43..401009.65 rows=125740 width=104) (actual
time=0.078..0.105 rows=10 loops=1)
Sort Key: webvisits_201712.ts DESC
Buffers: shared hit=7
-> Index Scan Backward using webvisits_201712_ipaddr_ts_idx on
webvisits_201712 (cost=0.43..133617.70 rows=41901 width=104) (actual
time=0.077..0.101 rows=10 loops=1)
Index Cond: (ipaddr = '93.184.216.34'::inet)
Buffers: shared hit=7
-> Index Scan Backward using webvisits_201711_ipaddr_ts_idx on
webvisits_201711 (cost=0.43..131514.18 rows=41243 width=104) (never executed)
Index Cond: (ipaddr = '93.184.216.34'::inet)
-> Index Scan Backward using webvisits_201710_ipaddr_ts_idx on
webvisits_201710 (cost=0.43..135801.71 rows=42587 width=104) (never executed)
........
We have developed a proof-of-concept implementation for this optimisation.
For sub partitioning, intermediate Append nodes can be generated.
Consider the following examples:
CREATE TABLE parentparent (c1 int, c2 int) PARTITION BY range (c1);
-- Subpartition only on second column
CREATE TABLE parent1 PARTITION OF parentparent FOR VALUES FROM (1) TO (10)
PARTITION BY range (c2);
-- Subpartition on both columns
CREATE TABLE parent2 PARTITION OF parentparent FOR VALUES FROM (10) TO (20)
PARTITION BY range (c1, c2);
CREATE TABLE child11 PARTITION OF parent1 FOR VALUES FROM (1) TO (10);
CREATE TABLE child12 PARTITION OF parent1 FOR VALUES FROM (10) TO (20);
CREATE TABLE child21 PARTITION OF parent2 FOR VALUES FROM (10, 1) TO (15, 10);
CREATE TABLE child22 PARTITION OF parent2 FOR VALUES FROM (15, 10) TO (20,
20);
EXPLAIN (COSTS OFF) SELECT * FROM parentparent ORDER BY c1;
QUERY PLAN
---------------------------------------
Append
Sort Key: parent1.c1
-> Sort
Sort Key: parent1.c1
-> Append
-> Seq Scan on parent1
-> Seq Scan on child11
-> Seq Scan on child12
-> Append
Sort Key: child21.c1
-> Sort
Sort Key: child21.c1
-> Seq Scan on child21
-> Sort
Sort Key: child22.c1
-> Seq Scan on child22
EXPLAIN (COSTS OFF) SELECT * FROM parentparent ORDER BY c1, c2;
QUERY PLAN
------------------------------------------------
Append
Sort Key: parent1.c1, parent1.c2
-> Sort
Sort Key: parent1.c1, parent1.c2
-> Append
-> Seq Scan on parent1
-> Seq Scan on child11
-> Seq Scan on child12
-> Append
Sort Key: child21.c1, child21.c2
-> Sort
Sort Key: child21.c1, child21.c2
-> Seq Scan on child21
-> Sort
Sort Key: child22.c1, child22.c2
-> Seq Scan on child22
===== Patch design =====
First, we don't really know what we're doing :). But this is a proof of
concept, and if there is interest in this patch, let us know how we should
have done things and we're willing to rewrite it entirely if needed.
==== Overview ====
In the optimiser, we generate PathKeys representing the two sort orders by
wich partition can be ordered (ASC and DESC), only for "native" range-based
partitioned tables, and store them in RelOptInfo associated with every parent
table, along with a list of OIDs corresponding to the partitions.
Then, when the time comes to generate append paths, we add another step which
tries to match the query_pathkeys to those of the partition, and generate
another append node with the sorted paths for each partitions, in the expected
order, and a pathkey to the append path itself to indicate its already sorted.
==== Known Problems with the patch ====
Once again, keep in mind it's a proof-of-concept
- It is in conflict with the existing patch designed to avoid adding the
parent partitions
to the append node. As such, it will almost certainly need a full rewrite to
adapt to that since that is done in this patch in a probably less than ideal
fashion.
- As of now, the patch doesn't try to match the partition pathkey to
mergejoinable pathkeys.
- maybe a new node should be created altogether, instead of porting most of
the MergeAppend struct fields to the basic Append ?
- the way we store the PathKeys and partitions OID directly in RelOptInfo is
probably wrong
- the major impact on existing queries is the fact that to handle sub-
partitioning, RangeTblEntries representing intermediate append nodes are added
(but just in the case of native partitioning, regular inheritance is not
affected). See updated prepunion.c for what that means. Those RTE are then
ignored
when generating the real Append nodes.
- new regression tests have not been written yet, and existing ones haven't
been "fixed" to take into account the different partition ordering: in case of
sub partitioning, the order is now "depth-first" rather than "breadth-first"
like it was earlier.
- no documentation has been added, although we don't really know where it
should go
- the patch lacks a lot of comments
- the key displayed in EXPLAIN output comes from the first partition, not the
parent.
- maybe a "real" SortPath should be generated, instead of generating Sort
nodes
out of nowhere when creating the plan. This has been done to conform to what
MergeAppend already did.
===== Questions =====
Is there interest for such an optimization ?
Should we work from the patch proposed to remove parent tables already ?
Should there be a new Node for such a "SortedAppend" operation or is it fine
keeping it with the Append node already ? Should that instead be an
optimization of the MergeAppend Node ?
What is or is not acceptable with regards to storing things in RelOptInfo
(among others) ?
More generally, any kind of feedback that will help us get on the right track
will be appreciated.
--
Ronan Dunklau & Julien Rouhaud
http://dalibo.com - http://dalibo.org
Attachments:
sorted_append_nodes_for_partitioning.patchtext/x-patch; charset=UTF-8; name=sorted_append_nodes_for_partitioning.patchDownload
commit 21e09fb1038b0fb48f32a9013bee64279af8dfd7
Author: Ronan Dunklau <ronan.dunklau@dalibo.com>
Date: Tue Feb 28 09:00:44 2017 +0100
Consider sorting order of partitions for append nodes
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c9b55ead3d..e566928f6c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -81,6 +81,8 @@ static void show_sort_keys(SortState *sortstate, List *ancestors,
ExplainState *es);
static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
ExplainState *es);
+static void show_append_keys(AppendState *mstate, List *ancestors,
+ ExplainState *es);
static void show_agg_keys(AggState *astate, List *ancestors,
ExplainState *es);
static void show_grouping_sets(PlanState *planstate, Agg *agg,
@@ -1561,6 +1563,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_sort_keys(castNode(SortState, planstate), ancestors, es);
show_sort_info(castNode(SortState, planstate), es);
break;
+ case T_Append:
+ show_append_keys(castNode(AppendState, planstate),
+ ancestors, es);
+ break;
case T_MergeAppend:
show_merge_append_keys(castNode(MergeAppendState, planstate),
ancestors, es);
@@ -1703,7 +1709,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
ancestors, es);
break;
case T_MergeAppend:
- ExplainMemberNodes(((MergeAppend *) plan)->mergeplans,
+ ExplainMemberNodes(((MergeAppend*) plan)->plan.appendplans,
((MergeAppendState *) planstate)->mergeplans,
ancestors, es);
break;
@@ -1901,7 +1907,23 @@ static void
show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
ExplainState *es)
{
- MergeAppend *plan = (MergeAppend *) mstate->ps.plan;
+ Append *plan = (Append *) mstate->ps.plan;
+
+ show_sort_group_keys((PlanState *) mstate, "Sort Key",
+ plan->numCols, plan->sortColIdx,
+ plan->sortOperators, plan->collations,
+ plan->nullsFirst,
+ ancestors, es);
+}
+
+/*
+ * Likewise, for an Append node.
+ */
+static void
+show_append_keys(AppendState *mstate, List *ancestors,
+ ExplainState *es)
+{
+ Append *plan = (Append *) mstate->ps.plan;
show_sort_group_keys((PlanState *) mstate, "Sort Key",
plan->numCols, plan->sortColIdx,
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 7a20bf07a4..a586088211 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -63,6 +63,7 @@ MergeAppendState *
ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
MergeAppendState *mergestate = makeNode(MergeAppendState);
+ Append *append = &node->plan;
PlanState **mergeplanstates;
int nplans;
int i;
@@ -74,7 +75,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/*
* Set up empty vector of subplan states
*/
- nplans = list_length(node->mergeplans);
+ nplans = list_length(append->appendplans);
mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
@@ -108,7 +109,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* results into the array "mergeplans".
*/
i = 0;
- foreach(lc, node->mergeplans)
+ foreach(lc, append->appendplans)
{
Plan *initNode = (Plan *) lfirst(lc);
@@ -125,17 +126,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/*
* initialize sort-key information
*/
- mergestate->ms_nkeys = node->numCols;
- mergestate->ms_sortkeys = palloc0(sizeof(SortSupportData) * node->numCols);
+ mergestate->ms_nkeys = append->numCols;
+ mergestate->ms_sortkeys = palloc0(sizeof(SortSupportData) * append->numCols);
- for (i = 0; i < node->numCols; i++)
+ for (i = 0; i < append->numCols; i++)
{
SortSupport sortKey = mergestate->ms_sortkeys + i;
sortKey->ssup_cxt = CurrentMemoryContext;
- sortKey->ssup_collation = node->collations[i];
- sortKey->ssup_nulls_first = node->nullsFirst[i];
- sortKey->ssup_attno = node->sortColIdx[i];
+ sortKey->ssup_collation = append->collations[i];
+ sortKey->ssup_nulls_first = append->nullsFirst[i];
+ sortKey->ssup_attno = append->sortColIdx[i];
/*
* It isn't feasible to perform abbreviated key conversion, since
@@ -146,7 +147,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
sortKey->abbreviate = false;
- PrepareSortSupportFromOrderingOp(node->sortOperators[i], sortKey);
+ PrepareSortSupportFromOrderingOp(append->sortOperators[i], sortKey);
}
/*
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 25fd051d6e..2535ca0eec 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -219,6 +219,18 @@ _copyModifyTable(const ModifyTable *from)
return newnode;
}
+static void
+copyAppendFields(const Append *from, Append *newnode)
+{
+ CopyPlanFields((const Plan *) from, (Plan *) newnode);
+ COPY_NODE_FIELD(appendplans);
+ COPY_SCALAR_FIELD(numCols);
+ COPY_POINTER_FIELD(sortColIdx, from->numCols * sizeof(AttrNumber));
+ COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
+ COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
+ COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
+}
+
/*
* _copyAppend
*/
@@ -226,17 +238,7 @@ static Append *
_copyAppend(const Append *from)
{
Append *newnode = makeNode(Append);
-
- /*
- * copy node superclass fields
- */
- CopyPlanFields((const Plan *) from, (Plan *) newnode);
-
- /*
- * copy remainder of node
- */
- COPY_NODE_FIELD(appendplans);
-
+ copyAppendFields(from, newnode);
return newnode;
}
@@ -247,22 +249,7 @@ static MergeAppend *
_copyMergeAppend(const MergeAppend *from)
{
MergeAppend *newnode = makeNode(MergeAppend);
-
- /*
- * copy node superclass fields
- */
- CopyPlanFields((const Plan *) from, (Plan *) newnode);
-
- /*
- * copy remainder of node
- */
- COPY_NODE_FIELD(mergeplans);
- COPY_SCALAR_FIELD(numCols);
- COPY_POINTER_FIELD(sortColIdx, from->numCols * sizeof(AttrNumber));
- COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
- COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
- COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
-
+ copyAppendFields((const Append *)from, (Append *)newnode);
return newnode;
}
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6e52eb7231..25a1822c18 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3743,7 +3743,7 @@ planstate_tree_walker(PlanState *planstate,
return true;
break;
case T_MergeAppend:
- if (planstate_walk_members(((MergeAppend *) plan)->mergeplans,
+ if (planstate_walk_members(((MergeAppend *) plan)->plan.appendplans,
((MergeAppendState *) planstate)->mergeplans,
walker, context))
return true;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 7418fbeded..af3c09e44b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -362,26 +362,11 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
}
static void
-_outAppend(StringInfo str, const Append *node)
+_outAppendInfo(StringInfo str, const Append *node)
{
- WRITE_NODE_TYPE("APPEND");
-
+ int i;
_outPlanInfo(str, (const Plan *) node);
-
WRITE_NODE_FIELD(appendplans);
-}
-
-static void
-_outMergeAppend(StringInfo str, const MergeAppend *node)
-{
- int i;
-
- WRITE_NODE_TYPE("MERGEAPPEND");
-
- _outPlanInfo(str, (const Plan *) node);
-
- WRITE_NODE_FIELD(mergeplans);
-
WRITE_INT_FIELD(numCols);
appendStringInfoString(str, " :sortColIdx");
@@ -399,6 +384,23 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
appendStringInfoString(str, " :nullsFirst");
for (i = 0; i < node->numCols; i++)
appendStringInfo(str, " %s", booltostr(node->nullsFirst[i]));
+
+}
+
+static void
+_outAppend(StringInfo str, const Append *node)
+{
+ WRITE_NODE_TYPE("APPEND");
+ _outAppendInfo(str, node);
+}
+
+
+
+static void
+_outMergeAppend(StringInfo str, const MergeAppend *node)
+{
+ WRITE_NODE_TYPE("MERGEAPPEND");
+ _outAppendInfo(str, (const Append *) &node->plan);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d3bbc02f24..771aec38d3 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1554,17 +1554,31 @@ _readModifyTable(void)
READ_DONE();
}
+
+static void
+ReadCommonAppend(Append* local_node)
+{
+ READ_TEMP_LOCALS();
+
+ ReadCommonPlan(&local_node->plan);
+
+ READ_NODE_FIELD(appendplans);
+ READ_INT_FIELD(numCols);
+ READ_ATTRNUMBER_ARRAY(sortColIdx, local_node->numCols);
+ READ_OID_ARRAY(sortOperators, local_node->numCols);
+ READ_OID_ARRAY(collations, local_node->numCols);
+ READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
+}
+
/*
* _readAppend
*/
static Append *
_readAppend(void)
{
- READ_LOCALS(Append);
-
- ReadCommonPlan(&local_node->plan);
+ READ_LOCALS_NO_FIELDS(Append);
- READ_NODE_FIELD(appendplans);
+ ReadCommonAppend(local_node);
READ_DONE();
}
@@ -1575,16 +1589,9 @@ _readAppend(void)
static MergeAppend *
_readMergeAppend(void)
{
- READ_LOCALS(MergeAppend);
-
- ReadCommonPlan(&local_node->plan);
+ READ_LOCALS_NO_FIELDS(MergeAppend);
- READ_NODE_FIELD(mergeplans);
- READ_INT_FIELD(numCols);
- READ_ATTRNUMBER_ARRAY(sortColIdx, local_node->numCols);
- READ_OID_ARRAY(sortOperators, local_node->numCols);
- READ_OID_ARRAY(collations, local_node->numCols);
- READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
+ ReadCommonAppend(&local_node->plan);
READ_DONE();
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43bfd23804..851a6facdb 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -26,6 +26,7 @@
#include "foreign/fdwapi.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "miscadmin.h"
#ifdef OPTIMIZER_DEBUG
#include "nodes/print.h"
#endif
@@ -93,6 +94,9 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static int indexof_path_relation_in_oid_list(Oid oid, Oid* sorted_oids, int array_size);
+static void generate_sorted_append_paths(PlannerInfo *root, RelOptInfo *parent_rel, List *ordered_childrels);
+
static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -1185,6 +1189,24 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
int parentRTindex = rti;
List *live_childrels = NIL;
ListCell *l;
+ /* If the parent table is partitioned, sort the childrels according to
+ * the partitioning ASC ordering */
+ int currentidx = 0;
+ bool is_partitioned = rel->rel_sorted_part_oids != NIL;
+ int num_parts = list_length(rel->rel_sorted_part_oids);
+ Oid *parts_oids = palloc0(sizeof(Oid) * num_parts);
+ RelOptInfo **ordered_partition_rels = palloc0(sizeof(RelOptInfo*) * num_parts);
+ /* Transform the partition's oid list into an array */
+ if(is_partitioned)
+ {
+ ListCell *lc;
+ int i = 0;
+ foreach(lc, rel->rel_sorted_part_oids)
+ {
+ parts_oids[i] = lfirst_oid(lc);
+ i++;
+ }
+ }
/*
* Generate access paths for each member relation, and remember the
@@ -1226,10 +1248,39 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
if (IS_DUMMY_REL(childrel))
continue;
+
+
/*
- * Child is live, so add it to the live_childrels list for use below.
+ * Child is live, so add it to the ordered_partition_rel
+ * For partitioned cases, add the RelOptInfo in the right position of
+ * the sorted array. Parent partition table is guaranted to be empty,
+ * so exclude it. In the general case, just add it directly to the
+ * resulting list
*/
- live_childrels = lappend(live_childrels, childrel);
+ if ( is_partitioned ) {
+ /* Exclude childRTE generated to make sure the parent table would be
+ * scanned */
+ if(childRTE->relid != rte->relid)
+ {
+ int partindex = indexof_path_relation_in_oid_list(childRTE->relid, parts_oids, num_parts);
+ ordered_partition_rels[partindex] = childrel;
+ }
+ }
+ else
+ live_childrels = lappend(live_childrels, childrel);
+
+
+ currentidx++;
+ }
+
+ /* Finally, build the live_childrels list in the ASC order
+ * of the partition key */
+ if ( is_partitioned ) {
+ int i;
+ for(i = 0; i < list_length(rel->rel_sorted_part_oids); i++)
+ {
+ live_childrels = lappend(live_childrels, ordered_partition_rels[i]);
+ }
}
/* Add paths to the "append" relation. */
@@ -1260,6 +1311,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *all_child_outers = NIL;
ListCell *l;
+
+
/*
* For every non-dummy child, remember the cheapest path. Also, identify
* all pathkeys (orderings) and parameterizations (required_outer sets)
@@ -1288,6 +1341,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
else
partial_subpaths_valid = false;
+
+
/*
* Collect lists of all the available path orderings and
* parameterizations for all the children. We use these as a
@@ -1359,14 +1414,16 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
* if we have zero or one live subpath due to constraint exclusion.)
*/
if (subpaths_valid)
- add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0));
+ add_path(rel, (Path *) create_append_path(root, rel, subpaths, NULL, 0, NULL));
+ /* If we are partitioned, build ordered append path matching the
+ * PathKeys derived from the partition key */
+ generate_sorted_append_paths(root, rel, live_childrels);
/*
* Consider an append of partial unordered, unparameterized partial paths.
*/
if (partial_subpaths_valid)
{
- AppendPath *appendpath;
ListCell *lc;
int parallel_workers = 0;
@@ -1385,9 +1442,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Assert(parallel_workers > 0);
/* Generate a partial append path. */
- appendpath = create_append_path(rel, partial_subpaths, NULL,
- parallel_workers);
- add_partial_path(rel, (Path *) appendpath);
+ add_partial_path(rel, (Path *) create_append_path(root, rel, partial_subpaths, NULL,
+ parallel_workers, NULL));
}
/*
@@ -1396,7 +1452,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
*/
if (subpaths_valid)
generate_mergeappend_paths(root, rel, live_childrels,
- all_child_pathkeys);
+ all_child_pathkeys);
/*
* Build Append paths for each parameterization seen among the child rels.
@@ -1438,10 +1494,124 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
if (subpaths_valid)
add_path(rel, (Path *)
- create_append_path(rel, subpaths, required_outer, 0));
+ create_append_path(root, rel, subpaths, required_outer, 0, NULL));
}
}
+static int
+indexof_path_relation_in_oid_list(Oid oid, Oid* sorted_oids, int array_size)
+{
+ int i;
+
+ for(i=0; i < array_size; i++)
+ {
+ if(sorted_oids[i] == oid)
+ return i;
+ }
+
+ return -1;
+}
+
+
+static void
+generate_sorted_append_paths(PlannerInfo *root, RelOptInfo *parent_rel, List *ordered_childrels)
+{
+ ListCell *lc;
+ List *partitions_asc = ordered_childrels;
+ List *partitions_desc = NIL;
+ RangeTblEntry * parent_rte = planner_rt_fetch(parent_rel->relid, root);
+
+ if(parent_rte->relkind != RELKIND_PARTITIONED_TABLE)
+ return;
+
+ foreach(lc, partitions_asc)
+ {
+ partitions_desc = lcons(lfirst(lc), partitions_desc);
+ }
+
+ foreach(lc, parent_rel->rel_partitioned_pathkeys)
+ {
+ List *pathkeys = (List *) lfirst(lc);
+ PathKey *first = (PathKey *) linitial(pathkeys);
+ List *ordered_partitions = first->pk_strategy == BTLessStrategyNumber ?
+ partitions_asc : partitions_desc;
+ List *startup_subpaths = NIL;
+ List *total_subpaths = NIL;
+ List *sequential_subpaths = NIL;
+ bool startup_neq_total = false;
+ ListCell *lc2;
+ if(compare_pathkeys(pathkeys, root->query_pathkeys) == PATHKEYS_DIFFERENT)
+ {
+ continue;
+ }
+ foreach(lc2, ordered_partitions)
+ {
+ RelOptInfo *childrel = lfirst(lc2);
+ Path *cheapest_startup,
+ *cheapest_total,
+ *sequential = NULL;
+ /* The partition may have been pruned */
+ if (!childrel)
+ continue;
+
+ cheapest_startup = get_cheapest_path_for_pathkeys(childrel->pathlist,
+ root->query_pathkeys,
+ NULL,
+ STARTUP_COST, false);
+ cheapest_total = get_cheapest_path_for_pathkeys(childrel->pathlist,
+ root->query_pathkeys,
+ NULL,
+ TOTAL_COST, false);
+
+ /*
+ * If we can't find any paths with the right order just use the
+ * cheapest-total path; we'll have to sort it later.
+ */
+ if (cheapest_startup == NULL || cheapest_total == NULL)
+ {
+ cheapest_startup = cheapest_total =
+ childrel->cheapest_total_path;
+ /* Assert we do have an unparameterized path for this child */
+ Assert(cheapest_total->param_info == NULL);
+ }
+ /*
+ * Force a an unordered path, which could be cheaper in corner cases where
+ * orderedpaths are too expensive.
+ */
+ sequential = childrel->cheapest_total_path;
+
+ /*
+ * Notice whether we actually have different paths for the
+ * "cheapest" and "total" cases; frequently there will be no point
+ * in two create_merge_append_path() calls.
+ */
+ if (cheapest_startup != cheapest_total)
+ startup_neq_total = true;
+ startup_subpaths =
+ lappend(startup_subpaths, cheapest_startup);
+ total_subpaths =
+ lappend(total_subpaths, cheapest_total);
+ sequential_subpaths =
+ lappend(sequential_subpaths, sequential);
+
+ }
+ if(startup_subpaths)
+ {
+ add_path(parent_rel, (Path *) create_append_path(root, parent_rel, startup_subpaths,
+ NULL, 0, root->query_pathkeys));
+ }
+ if (startup_neq_total)
+ add_path(parent_rel, (Path *) create_append_path(root,
+ parent_rel, total_subpaths, NULL, 0, root->query_pathkeys));
+ if (sequential_subpaths){
+ add_path(parent_rel, (Path *) create_append_path(root,
+ parent_rel, sequential_subpaths, NULL, 0, root->query_pathkeys));
+ }
+ }
+}
+
+
+
/*
* generate_mergeappend_paths
* Generate MergeAppend paths for an append relation
@@ -1670,8 +1840,7 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
/* Discard any pre-existing paths; no further need for them */
rel->pathlist = NIL;
rel->partial_pathlist = NIL;
-
- add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+ add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NULL, 0, NULL));
/*
* We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 0551668976..5b3c29b035 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,7 @@ mark_dummy_rel(RelOptInfo *rel)
rel->partial_pathlist = NIL;
/* Set up the dummy path */
- add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0));
+ add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NULL, 0, NIL));
/* Set or update cheapest_total_path and related fields */
set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 89e1946fc2..66d3ac6273 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -78,8 +78,9 @@ static List *get_gating_quals(PlannerInfo *root, List *quals);
static Plan *create_gating_plan(PlannerInfo *root, Path *path, Plan *plan,
List *gating_quals);
static Plan *create_join_plan(PlannerInfo *root, JoinPath *best_path);
-static Plan *create_append_plan(PlannerInfo *root, AppendPath *best_path);
-static Plan *create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path);
+static Plan *create_append_plan(NodeTag node_type, PlannerInfo *root, AppendPath *best_path);
+static Plan *wrap_sort(PlannerInfo *root, Append* parentplan, Path *subpath, List* pathkeys, double limit_tuples);
+
static Result *create_result_plan(PlannerInfo *root, ResultPath *best_path);
static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
@@ -199,7 +200,7 @@ static CteScan *make_ctescan(List *qptlist, List *qpqual,
Index scanrelid, int ctePlanId, int cteParam);
static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist);
+static Append *make_append(NodeTag node_type, List *tlist);
static RecursiveUnion *make_recursive_union(List *tlist,
Plan *lefttree,
Plan *righttree,
@@ -377,12 +378,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
(JoinPath *) best_path);
break;
case T_Append:
- plan = create_append_plan(root,
- (AppendPath *) best_path);
- break;
case T_MergeAppend:
- plan = create_merge_append_plan(root,
- (MergeAppendPath *) best_path);
+ plan = create_append_plan(best_path->pathtype, root,
+ (AppendPath *) best_path);
break;
case T_Result:
if (IsA(best_path, ProjectionPath))
@@ -976,13 +974,15 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path)
* Returns a Plan node.
*/
static Plan *
-create_append_plan(PlannerInfo *root, AppendPath *best_path)
+create_append_plan(NodeTag node_type, PlannerInfo *root, AppendPath *best_path)
{
- Append *plan;
+ Append *node;
+ Plan *plan;
List *tlist = build_path_tlist(root, &best_path->path);
+ List *pathkeys = best_path->path.pathkeys;
List *subplans = NIL;
ListCell *subpaths;
-
+ double limit_tuples = best_path->limit_tuples;
/*
* The subpaths list could be empty, if every child was proven empty by
* constraint exclusion. In that case generate a dummy plan that returns
@@ -994,9 +994,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
*/
if (best_path->subpaths == NIL)
{
- /* Generate a Result plan with constant-FALSE gating qual */
- Plan *plan;
-
plan = (Plan *) make_result(tlist,
(Node *) list_make1(makeBoolConst(false,
false)),
@@ -1006,63 +1003,10 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
return plan;
}
-
- /* Build the plan for each child */
- foreach(subpaths, best_path->subpaths)
- {
- Path *subpath = (Path *) lfirst(subpaths);
- Plan *subplan;
-
- /* Must insist that all children return the same tlist */
- subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
-
- subplans = lappend(subplans, subplan);
- }
-
- /*
- * XXX ideally, if there's just one child, we'd not bother to generate an
- * Append node but just return the single child. At the moment this does
- * not work because the varno of the child scan plan won't match the
- * parent-rel Vars it'll be asked to emit.
- */
-
- plan = make_append(subplans, tlist);
-
- copy_generic_path_info(&plan->plan, (Path *) best_path);
-
- return (Plan *) plan;
-}
-
-/*
- * create_merge_append_plan
- * Create a MergeAppend plan for 'best_path' and (recursively) plans
- * for its subpaths.
- *
- * Returns a Plan node.
- */
-static Plan *
-create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
-{
- MergeAppend *node = makeNode(MergeAppend);
- Plan *plan = &node->plan;
- List *tlist = build_path_tlist(root, &best_path->path);
- List *pathkeys = best_path->path.pathkeys;
- List *subplans = NIL;
- ListCell *subpaths;
-
- /*
- * We don't have the actual creation of the MergeAppend node split out
- * into a separate make_xxx function. This is because we want to run
- * prepare_sort_from_pathkeys on it before we do so on the individual
- * child plans, to make cross-checking the sort info easier.
- */
+ node = make_append(node_type, tlist);
+ plan = &node->plan;
copy_generic_path_info(plan, (Path *) best_path);
plan->targetlist = tlist;
- plan->qual = NIL;
- plan->lefttree = NULL;
- plan->righttree = NULL;
-
- /* Compute sort column info, and adjust MergeAppend's tlist as needed */
(void) prepare_sort_from_pathkeys(plan, pathkeys,
best_path->path.parent->relids,
NULL,
@@ -1073,72 +1017,21 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
&node->collations,
&node->nullsFirst);
- /*
- * Now prepare the child plans. We must apply prepare_sort_from_pathkeys
- * even to subplans that don't need an explicit sort, to make sure they
- * are returning the same sort key columns the MergeAppend expects.
- */
+ /* Build the plan for each child */
foreach(subpaths, best_path->subpaths)
{
Path *subpath = (Path *) lfirst(subpaths);
- Plan *subplan;
- int numsortkeys;
- AttrNumber *sortColIdx;
- Oid *sortOperators;
- Oid *collations;
- bool *nullsFirst;
-
- /* Build the child plan */
- /* Must insist that all children return the same tlist */
- subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
-
- /* Compute sort column info, and adjust subplan's tlist as needed */
- subplan = prepare_sort_from_pathkeys(subplan, pathkeys,
- subpath->parent->relids,
- node->sortColIdx,
- false,
- &numsortkeys,
- &sortColIdx,
- &sortOperators,
- &collations,
- &nullsFirst);
-
- /*
- * Check that we got the same sort key information. We just Assert
- * that the sortops match, since those depend only on the pathkeys;
- * but it seems like a good idea to check the sort column numbers
- * explicitly, to ensure the tlists really do match up.
- */
- Assert(numsortkeys == node->numCols);
- if (memcmp(sortColIdx, node->sortColIdx,
- numsortkeys * sizeof(AttrNumber)) != 0)
- elog(ERROR, "MergeAppend child's targetlist doesn't match MergeAppend");
- Assert(memcmp(sortOperators, node->sortOperators,
- numsortkeys * sizeof(Oid)) == 0);
- Assert(memcmp(collations, node->collations,
- numsortkeys * sizeof(Oid)) == 0);
- Assert(memcmp(nullsFirst, node->nullsFirst,
- numsortkeys * sizeof(bool)) == 0);
-
- /* Now, insert a Sort node if subplan isn't sufficiently ordered */
- if (!pathkeys_contained_in(pathkeys, subpath->pathkeys))
- {
- Sort *sort = make_sort(subplan, numsortkeys,
- sortColIdx, sortOperators,
- collations, nullsFirst);
-
- label_sort_with_costsize(root, sort, best_path->limit_tuples);
- subplan = (Plan *) sort;
- }
-
+ /* TODO: decrease limit tuples for each subpath */
+ Plan *subplan = plan = wrap_sort(root, node, subpath, pathkeys, limit_tuples);
+ if(limit_tuples > 0)
+ limit_tuples = Max(1, limit_tuples - subpath->rows);
subplans = lappend(subplans, subplan);
}
-
- node->mergeplans = subplans;
-
+ node->appendplans = subplans;
return (Plan *) node;
}
+
/*
* create_result_plan
* Create a Result plan for 'best_path'.
@@ -1537,7 +1430,7 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
* anyway. Usually create_projection_path will have detected that and set
* dummypp if we don't need a Result; but its decision can't be final,
* because some createplan.c routines change the tlists of their nodes.
- * (An example is that create_merge_append_plan might add resjunk sort
+ * (An example is that create_append_plan might add resjunk sort
* columns to a MergeAppend.) So we have to recheck here. If we do
* arrive at a different answer than create_projection_path did, we'll
* have made slightly wrong cost estimates; but label the plan with the
@@ -5161,20 +5054,85 @@ make_foreignscan(List *qptlist,
}
static Append *
-make_append(List *appendplans, List *tlist)
+make_append(NodeTag node_type, List *tlist)
{
- Append *node = makeNode(Append);
- Plan *plan = &node->plan;
+ Append *node;
+ Plan *plan;
+ if(node_type != T_Append && node_type != T_MergeAppend)
+ elog(ERROR, "create_append_plan can only create Append or MergeAppend plans");
- plan->targetlist = tlist;
+ switch(node_type)
+ {
+ case T_Append:
+ node = makeNode(Append);
+ break;
+ case T_MergeAppend:
+ node = (Append *) makeNode(MergeAppend);
+ break;
+ default:
+ elog(ERROR, "create_append_plan can only create Append or MergeAppend plans");
+ }
+
+ plan = &node->plan; plan->targetlist = tlist;
plan->qual = NIL;
plan->lefttree = NULL;
plan->righttree = NULL;
- node->appendplans = appendplans;
-
+ node->appendplans = NIL;
+ node->numCols = 0;
+ node->sortColIdx = NULL;
+ node->sortOperators = NULL;
+ node->collations = NULL;
+ node->nullsFirst = NULL;
return node;
}
+static Plan *
+wrap_sort(PlannerInfo *root, Append* parentplan, Path *subpath, List* pathkeys, double limit_tuples)
+{
+ int numCols;
+ AttrNumber *sortColIdx;
+ Oid *sortOperators;
+ Oid *collations;
+ bool *nullsFirst;
+ Plan *subplan;
+ subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
+ if(pathkeys != NIL)
+ {
+ subplan = prepare_sort_from_pathkeys(subplan, pathkeys,
+ subpath->parent->relids,
+ parentplan->sortColIdx,
+ false,
+ &numCols,
+ &sortColIdx,
+ &sortOperators,
+ &collations,
+ &nullsFirst);
+ Assert(numCols == parentplan->numCols);
+ if (memcmp(sortColIdx, parentplan->sortColIdx,
+ numCols * sizeof(AttrNumber)) != 0)
+ elog(ERROR, "MergeAppend child's targetlist doesn't match MergeAppend");
+ Assert(memcmp(sortOperators, parentplan->sortOperators,
+ numCols * sizeof(Oid)) == 0);
+ Assert(memcmp(collations, parentplan->collations,
+ numCols * sizeof(Oid)) == 0);
+ Assert(memcmp(nullsFirst, parentplan->nullsFirst,
+ numCols * sizeof(bool)) == 0);
+ /* Now, insert a Sort node if subplan isn't sufficiently ordered */
+ if (!pathkeys_contained_in(pathkeys, subpath->pathkeys))
+ {
+ Sort *sort = make_sort(subplan, numCols,
+ sortColIdx, sortOperators,
+ collations, nullsFirst);
+
+ label_sort_with_costsize(root, sort, limit_tuples);
+ subplan = (Plan *) sort;
+ }
+
+ }
+ return subplan;
+
+}
+
static RecursiveUnion *
make_recursive_union(List *tlist,
Plan *lefttree,
@@ -5587,6 +5545,7 @@ prepare_sort_from_pathkeys(Plan *lefttree, List *pathkeys,
break; /* found usable expression */
}
}
+
if (!j)
elog(ERROR, "could not find pathkey item to sort");
@@ -6088,7 +6047,6 @@ make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
tle = NULL;
}
}
-
if (!tle)
elog(ERROR, "could not find pathkey item to sort");
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 3c58d0596c..43d3a57e60 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -25,6 +25,7 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
#include "optimizer/planmain.h"
@@ -149,6 +150,7 @@ query_planner(PlannerInfo *root, List *tlist,
*/
build_base_rel_tlists(root, tlist);
+
find_placeholders_in_jointree(root);
find_lateral_references(root);
@@ -176,6 +178,14 @@ query_planner(PlannerInfo *root, List *tlist,
*/
(*qp_callback) (root, qp_extra);
+
+ /*
+ * We consider generating pathkeys for partitionned tables only if the
+ * query has some ordering
+ */
+ if (root->query_pathkeys != NIL)
+ generate_pathkeys_for_partitioned_tables(root);
+
/*
* Examine any "placeholder" expressions generated during subquery pullup.
* Make sure that the Vars they need are marked as needed at the relevant
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 02286d9c52..2e5f23182c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3345,10 +3345,12 @@ create_grouping_paths(PlannerInfo *root,
paths = lappend(paths, path);
}
path = (Path *)
- create_append_path(grouped_rel,
+ create_append_path(root,
+ grouped_rel,
paths,
NULL,
- 0);
+ 0,
+ NULL);
path->pathtarget = target;
}
else
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5f3027e96f..ec579c7141 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -892,8 +892,8 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
* targetlist or check quals.
*/
set_dummy_tlist_references(plan, rtoffset);
- Assert(splan->plan.qual == NIL);
- foreach(l, splan->mergeplans)
+ Assert(splan->plan.plan.qual == NIL);
+ foreach(l, splan->plan.appendplans)
{
lfirst(l) = set_plan_refs(root,
(Plan *) lfirst(l),
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 6fa6540662..c61238e22d 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2565,7 +2565,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
{
ListCell *l;
- foreach(l, ((MergeAppend *) plan)->mergeplans)
+ foreach(l, ((MergeAppend *) plan)->plan.appendplans)
{
context.paramids =
bms_add_members(context.paramids,
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1389db18ba..077c179349 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -566,7 +566,7 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
/*
* Append the child results together.
*/
- path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+ path = (Path *) create_append_path(root, result_rel, pathlist, NULL, 0, NIL);
/* We have to manually jam the right tlist into the path; ick */
path->pathtarget = create_pathtarget(root, tlist);
@@ -678,7 +678,7 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
/*
* Append the child results together.
*/
- path = (Path *) create_append_path(result_rel, pathlist, NULL, 0);
+ path = (Path *) create_append_path(root, result_rel, pathlist, NULL, 0, NIL);
/* We have to manually jam the right tlist into the path; ick */
path->pathtarget = create_pathtarget(root, tlist);
@@ -1405,7 +1405,14 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
lockmode = AccessShareLock;
/* Scan for all members of inheritance set, acquire needed locks */
- inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
+ if (rte->relkind == RELKIND_PARTITIONED_TABLE){
+ inhOIDs = find_inheritance_children(parentOID, lockmode);
+ /* find_all_inheritors adds self ourself, but find_inheritance_children
+ * does not. We need to add it. */
+ inhOIDs = lcons_oid(parentOID, inhOIDs);
+ } else {
+ inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
+ }
/*
* Check that there's at least one descendant, else treat as no-child
@@ -1476,7 +1483,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
childrte = copyObject(rte);
childrte->relid = childOID;
childrte->relkind = newrelation->rd_rel->relkind;
- childrte->inh = false;
+ childrte->inh = rte->relkind == RELKIND_PARTITIONED_TABLE && childOID != parentOID;
childrte->requiredPerms = 0;
childrte->securityQuals = NIL;
parse->rtable = lappend(parse->rtable, childrte);
@@ -1495,6 +1502,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
appinfo->parent_reloid = parentOID;
appinfos = lappend(appinfos, appinfo);
+
/*
* Translate the column permissions bitmaps to the child's attnums (we
* have to build the translated_vars list before we can do this). But
@@ -1537,6 +1545,13 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
root->rowMarks = lappend(root->rowMarks, newrc);
}
+ /*
+ * Recursively expand the child for partitioned tables
+ * For regular inheritance, all children are flattened
+ */
+ expand_inherited_rtentry(root, childrte, childRTindex);
+
+
/* Close child relations, but keep locks */
if (childOID != parentOID)
heap_close(newrelation, NoLock);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 8ce772d274..4271dfde49 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1200,12 +1200,17 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
* Note that we must handle subpaths = NIL, representing a dummy access path.
*/
AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
- int parallel_workers)
+create_append_path(PlannerInfo *root, RelOptInfo *rel, List *subpaths, Relids required_outer,
+ int parallel_workers, List *pathkeys)
{
AppendPath *pathnode = makeNode(AppendPath);
ListCell *l;
-
+ double limit_tuples;
+ Cost input_startup_cost;
+ Cost input_total_cost;
+ /* Just to make sure that nobody is trying to build a parallel sorted append
+ * path */
+ Assert(parallel_workers > 0 ? pathkeys == NIL : true);
pathnode->path.pathtype = T_Append;
pathnode->path.parent = rel;
pathnode->path.pathtarget = rel->reltarget;
@@ -1214,8 +1219,7 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel;
pathnode->path.parallel_workers = parallel_workers;
- pathnode->path.pathkeys = NIL; /* result is always considered
- * unsorted */
+ pathnode->path.pathkeys = pathkeys;
pathnode->subpaths = subpaths;
/*
@@ -1229,22 +1233,48 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
pathnode->path.rows = 0;
pathnode->path.startup_cost = 0;
pathnode->path.total_cost = 0;
+ if (root && bms_equal(rel->relids, root->all_baserels))
+ pathnode->limit_tuples = root->limit_tuples;
+ else
+ pathnode->limit_tuples = -1.0;
+ limit_tuples = pathnode->limit_tuples;
foreach(l, subpaths)
{
Path *subpath = (Path *) lfirst(l);
+ input_startup_cost = subpath->startup_cost;
+ input_total_cost = subpath->total_cost;
- pathnode->path.rows += subpath->rows;
+ if(pathkeys && !pathkeys_contained_in(pathkeys, subpath->pathkeys))
+ {
+ Path sort_path; /* dummy for result of cost_sort */
+ cost_sort(&sort_path,
+ root,
+ pathkeys,
+ subpath->total_cost,
+ subpath->rows,
+ subpath->pathtarget->width,
+ 0.0,
+ work_mem,
+ limit_tuples);
+ input_startup_cost += sort_path.startup_cost;
+ input_total_cost = sort_path.total_cost;
+ }
+
+ pathnode->path.rows += subpath->rows;
if (l == list_head(subpaths)) /* first node? */
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost += subpath->total_cost;
+ pathnode->path.startup_cost = input_startup_cost;
+ /* Consider that the number of tuples to be fetched decreases
+ * for every subsequent partition */
+ if(limit_tuples > 0)
+ limit_tuples = Max(1, limit_tuples - subpath->rows);
+
+ pathnode->path.total_cost += input_total_cost;
pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
subpath->parallel_safe;
-
/* All child paths must have same parameterization */
Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
}
-
return pathnode;
}
@@ -3208,7 +3238,6 @@ create_limit_path(PlannerInfo *root, RelOptInfo *rel,
int64 offset_est, int64 count_est)
{
LimitPath *pathnode = makeNode(LimitPath);
-
pathnode->path.pathtype = T_Limit;
pathnode->path.parent = rel;
/* Limit doesn't project, so use source path's pathtarget */
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 463f806467..c51685e2be 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -34,6 +34,7 @@
#include "nodes/makefuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
+#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/predtest.h"
#include "optimizer/prep.h"
@@ -409,7 +410,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
rel->serverid = InvalidOid;
rel->fdwroutine = NULL;
}
-
/* Collect info about relation's foreign keys, if relevant */
get_relation_foreign_keys(root, rel, relation, inhparent);
@@ -1251,6 +1251,139 @@ get_relation_constraints(PlannerInfo *root,
return result;
}
+/*
+ * Generate pathkeys for Range-based partitions
+ */
+void
+generate_pathkeys_for_partitioned_tables(PlannerInfo *root)
+{
+ int i;
+ for(i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[i];
+ /* Only base relation can be partitionned */
+ if(rel && has_useful_pathkeys(root, rel))
+ {
+ Index varno = rel->relid;
+ Index parent_varno = varno;
+ RelOptInfo *parent_rel = rel;
+ Relation relation;
+ RangeTblEntry *rte = planner_rt_fetch(varno, root);
+ if(rte->relkind != RELKIND_PARTITIONED_TABLE)
+ continue;
+ while(parent_rel->reloptkind != RELOPT_BASEREL)
+ {
+ ListCell *lc;
+ foreach(lc, root->append_rel_list)
+ {
+ AppendRelInfo *ari = lfirst(lc);
+ if(ari->child_relid == parent_rel->relid){
+ parent_rel = root->simple_rel_array[ari->parent_relid];
+ break;
+ }
+ /* Should never happen: we should always be able to climb up the
+ * inheritance tree */
+ if(!lc)
+ {
+ elog(ERROR, "Unable to find parent table for child ");
+ }
+
+ }
+
+ }
+ parent_varno = parent_rel->relid;
+
+
+ /*
+ * Ignore base rel if it's not a partitionned table. We'll still
+ * have to open it to verify if it's a range partitionned table
+ */
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ continue;
+
+ relation = heap_open(rte->relid, NoLock);
+
+ /*
+ * Store the child partitions OIDs and build pathkeys for the
+ * partitioning keys
+ */
+ if(relation->rd_partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ int j;
+ ListCell *lc;
+ Oid equality_op;
+ EquivalenceClass *ec;
+ List *opfamilies;
+ List *asc_pathkeys = NIL;
+ List *desc_pathkeys = NIL;
+ Assert(rel->rel_sorted_part_oids == NIL);
+ Assert(rel->rel_partitioned_pathkeys == NIL);
+
+ for(j = 0; j < relation->rd_partdesc->nparts; j++)
+ {
+ rel->rel_sorted_part_oids =
+ lappend_oid(rel->rel_sorted_part_oids,
+ relation->rd_partdesc->oids[j]);
+ }
+
+ /* Lookup individual vars from the pathtarget */
+ /* FIXME: refactor this in an external function */
+ lc = list_head(relation->rd_partkey->partexprs);
+ for(j=0; j < relation->rd_partkey->partnatts; j++)
+ {
+ AttrNumber attno = relation->rd_partkey->partattrs[j];
+ Expr *expr = NULL;
+ /* This is not an attribute, but an expression */
+ if(attno == InvalidAttrNumber)
+ {
+ /* Should never append : we should be able to fetch
+ * an expression for anything in the partition key */
+ if (!lc)
+ elog(ERROR, "Could not find expression for partition key");
+ expr = lfirst(lc);
+ lc = lnext(lc);
+ }
+ else
+ {
+ expr = (Expr*) makeVar(parent_varno, attno, relation->rd_partkey->parttypid[j],
+ relation->rd_partkey->parttypmod[j],
+ relation->rd_partkey->parttypcoll[j],
+ 0);
+ }
+ equality_op = get_opfamily_member(relation->rd_partkey->partopfamily[j],
+ relation->rd_partkey->partopcintype[j],
+ relation->rd_partkey->partopcintype[j],
+ BTEqualStrategyNumber);
+ opfamilies = get_mergejoin_opfamilies(equality_op);
+ ec = get_eclass_for_sort_expr(root, expr,
+ NULL, opfamilies,
+ relation->rd_partkey->partopcintype[j],
+ relation->rd_partkey->partcollation[j],
+ 0, rel->relids, true);
+ asc_pathkeys = lappend(asc_pathkeys, make_canonical_pathkey(root,
+ ec,
+ relation->rd_partkey->partopfamily[j],
+ BTLessStrategyNumber, false));
+ desc_pathkeys = lappend(desc_pathkeys, make_canonical_pathkey(root,
+ ec,
+ relation->rd_partkey->partopfamily[j],
+ BTGreaterStrategyNumber, true));
+ }
+ /* FIXME: this is as dirty as it gets */
+ if(list_length(asc_pathkeys) > list_length(root->query_pathkeys))
+ {
+ asc_pathkeys = truncate_useless_pathkeys(root, rel, asc_pathkeys);
+ desc_pathkeys = truncate_useless_pathkeys(root, rel, desc_pathkeys);
+ }
+ if(asc_pathkeys)
+ rel->rel_partitioned_pathkeys = lappend(rel->rel_partitioned_pathkeys, asc_pathkeys);
+ if(desc_pathkeys)
+ rel->rel_partitioned_pathkeys = lappend(rel->rel_partitioned_pathkeys, desc_pathkeys);
+ }
+ heap_close(relation, NoLock);
+ }
+ }
+}
/*
* relation_excluded_by_constraints
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 6ab78545c3..b945a86fe7 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -143,6 +143,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
rel->baserestrict_min_security = UINT_MAX;
rel->joininfo = NIL;
rel->has_eclass_joins = false;
+ rel->rel_partitioned_pathkeys = NIL;
+ rel->rel_sorted_part_oids = NIL;
/* Check type of rtable entry */
switch (rte->rtekind)
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b880dc16cf..9532c09d37 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -228,6 +228,18 @@ typedef struct Append
{
Plan plan;
List *appendplans;
+ /* remaining fields are just like the sort-key info in struct Sort */
+ /* FIXME: We should either
+ * - define Append as sort + appendplans
+ * - define Append "like" sort and define MergeAppend as Append, only with
+ * a different tag
+ */
+ int numCols; /* number of sort-key columns */
+ AttrNumber *sortColIdx; /* their indexes in the target list */
+ Oid *sortOperators; /* OIDs of operators to sort them by */
+ Oid *collations; /* OIDs of collations */
+ bool *nullsFirst; /* NULLS FIRST/LAST directions */
+
} Append;
/* ----------------
@@ -237,14 +249,7 @@ typedef struct Append
*/
typedef struct MergeAppend
{
- Plan plan;
- List *mergeplans;
- /* remaining fields are just like the sort-key info in struct Sort */
- int numCols; /* number of sort-key columns */
- AttrNumber *sortColIdx; /* their indexes in the target list */
- Oid *sortOperators; /* OIDs of operators to sort them by */
- Oid *collations; /* OIDs of collations */
- bool *nullsFirst; /* NULLS FIRST/LAST directions */
+ Append plan;
} MergeAppend;
/* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 05d6f07aea..d893bacf6e 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -531,6 +531,9 @@ typedef struct RelOptInfo
PlannerInfo *subroot; /* if subquery */
List *subplan_params; /* if subquery */
int rel_parallel_workers; /* wanted number of parallel workers */
+ List *rel_sorted_part_oids; /* if range partitioning */
+ List *rel_partitioned_pathkeys;
+
/* Information about foreign tables and foreign joins */
Oid serverid; /* identifies server for the table or join */
@@ -1117,6 +1120,7 @@ typedef struct AppendPath
{
Path path;
List *subpaths; /* list of component Paths */
+ double limit_tuples; /* hard limit on output tuples, or -1 */
} AppendPath;
#define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 373c7221a8..ed1f9875ab 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,8 +63,12 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
List *bitmapquals);
extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
- Relids required_outer, int parallel_workers);
+extern AppendPath *create_append_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *subpaths,
+ Relids required_outer,
+ int parallel_workers,
+ List *pathkeys);
extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
RelOptInfo *rel,
List *subpaths,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 25fe78cddd..92da03f75d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -228,4 +228,5 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
EquivalenceClass *eclass, Oid opfamily,
int strategy, bool nulls_first);
+
#endif /* PATHS_H */
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index 68fd713b09..1cc3a861f4 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -34,7 +34,7 @@ extern void estimate_rel_size(Relation rel, int32 *attr_widths,
BlockNumber *pages, double *tuples, double *allvisfrac);
extern int32 get_relation_data_width(Oid relid, int32 *attr_widths);
-
+extern void generate_pathkeys_for_partitioned_tables(PlannerInfo *root);
extern bool relation_excluded_by_constraints(PlannerInfo *root,
RelOptInfo *rel, RangeTblEntry *rte);
On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau <ronan.dunklau@dalibo.com> wrote:
With range partitioning, we guarantee that each partition contains non-
overlapping values. Since we know the range allowed for each partition, it is
possible to sort them according to the partition key (as is done already for
looking up partitions).Thus, we ca generate sorted Append plans instead of MergeAppend when sorting
occurs on the partition key.
Great idea. This is too late for v10 at this point, but please add it
to the next CommitFest so we don't forget about it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On lundi 20 mars 2017 15:52:03 CET Robert Haas wrote:
On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau <ronan.dunklau@dalibo.com>
wrote:
With range partitioning, we guarantee that each partition contains non-
overlapping values. Since we know the range allowed for each partition, it
is possible to sort them according to the partition key (as is done
already for looking up partitions).Thus, we ca generate sorted Append plans instead of MergeAppend when
sorting occurs on the partition key.Great idea. This is too late for v10 at this point, but please add it
to the next CommitFest so we don't forget about it.
I know it is too late, and thought that it was too early to add it to the
commitfest properly since so many design decisions should be discussed. Thanks
for the feedback, I added it.
--
Ronan Dunklau
http://dalibo.com - http://dalibo.org
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 20, 2017 at 8:33 PM, Ronan Dunklau <ronan.dunklau@dalibo.com> wrote:
On lundi 20 mars 2017 15:52:03 CET Robert Haas wrote:
On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau <ronan.dunklau@dalibo.com>
wrote:
With range partitioning, we guarantee that each partition contains non-
overlapping values. Since we know the range allowed for each partition, it
is possible to sort them according to the partition key (as is done
already for looking up partitions).Thus, we ca generate sorted Append plans instead of MergeAppend when
sorting occurs on the partition key.Great idea. This is too late for v10 at this point, but please add it
to the next CommitFest so we don't forget about it.I know it is too late, and thought that it was too early to add it to the
commitfest properly since so many design decisions should be discussed. Thanks
for the feedback, I added it.
Thanks for working on this. I was also thinking about the same.
I will try to get back to review/work with this for v11. Mean time, I
am working on partition-wise joins [1]/messages/by-id/CAFjFpRcMWwepj-Do1otxQ-GApGPSZ1FmH7YQvQTwzQOGczq_sw@mail.gmail.com. In those patches, I have added
a structure called PartitionScheme, which represents how a relation is
partitioned. For base relations it's mostly copy of PartitionDesc and
PartitionKey, but then it bubbles up across join, with each
partitioned join getting relevant partitioning scheme. If you could
base your design such that is uses PartitionScheme, it could be used
for joins and probably when Jeevan's patch for partition-wise
aggregate [2]http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg308861.html comes along, it can be used with grouping.
[1]: /messages/by-id/CAFjFpRcMWwepj-Do1otxQ-GApGPSZ1FmH7YQvQTwzQOGczq_sw@mail.gmail.com
[2]: http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg308861.html
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On vendredi 24 mars 2017 08:14:03 CEST Ashutosh Bapat wrote:
On Mon, Mar 20, 2017 at 8:33 PM, Ronan Dunklau <ronan.dunklau@dalibo.com>
wrote:
On lundi 20 mars 2017 15:52:03 CET Robert Haas wrote:
On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau <ronan.dunklau@dalibo.com>
wrote:
With range partitioning, we guarantee that each partition contains non-
overlapping values. Since we know the range allowed for each partition,
it
is possible to sort them according to the partition key (as is done
already for looking up partitions).Thus, we ca generate sorted Append plans instead of MergeAppend when
sorting occurs on the partition key.Great idea. This is too late for v10 at this point, but please add it
to the next CommitFest so we don't forget about it.I know it is too late, and thought that it was too early to add it to the
commitfest properly since so many design decisions should be discussed.
Thanks for the feedback, I added it.Thanks for working on this. I was also thinking about the same.
I will try to get back to review/work with this for v11. Mean time, I
am working on partition-wise joins [1]. In those patches, I have added
a structure called PartitionScheme, which represents how a relation is
partitioned. For base relations it's mostly copy of PartitionDesc and
PartitionKey, but then it bubbles up across join, with each
partitioned join getting relevant partitioning scheme. If you could
base your design such that is uses PartitionScheme, it could be used
for joins and probably when Jeevan's patch for partition-wise
aggregate [2] comes along, it can be used with grouping.[1]
/messages/by-id/CAFjFpRcMWwepj-Do1otxQ-GApGPSZ1FmH7YQ
vQTwzQOGczq_sw%40mail.gmail.com [2]
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg308861.html
Thank you. I should probably wait until your patch is finalised before
spending too much time on it, and I could probably also leverage the
incremental sort patch if there is progress on it.
--
Ronan Dunklau
http://dalibo.com - http://dalibo.org
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/20/17 11:03, Ronan Dunklau wrote:
Great idea. This is too late for v10 at this point, but please add it
to the next CommitFest so we don't forget about it.I know it is too late, and thought that it was too early to add it to the
commitfest properly since so many design decisions should be discussed. Thanks
for the feedback, I added it.
This patch no longer applies. Please send an update.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 31, 2017 at 5:30 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 3/20/17 11:03, Ronan Dunklau wrote:
Great idea. This is too late for v10 at this point, but please add it
to the next CommitFest so we don't forget about it.I know it is too late, and thought that it was too early to add it to the
commitfest properly since so many design decisions should be discussed. Thanks
for the feedback, I added it.This patch no longer applies. Please send an update.
Thanks for the reminder, and sorry for very very late answer. PFA
rebased patch on current head.
I unfortunately didn't have time yet to read carefully the newly added
infrastructure (30833ba154e0c1106d61e3270242dc5999a3e4f3 and
following), so this patch is still using its own copy of partitions
list. I hope I can work on it shortly, but I prefer to send the
rebased version now, since it's still a POC with knowns problems and
questions, and I'll more than welcome any guidance with it.
Attachments:
sorted_append_v2.difftext/plain; charset=US-ASCII; name=sorted_append_v2.diffDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c1602c59cc..20e63b3204 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -80,6 +80,8 @@ static void show_upper_qual(List *qual, const char *qlabel,
ExplainState *es);
static void show_sort_keys(SortState *sortstate, List *ancestors,
ExplainState *es);
+static void show_append_keys(AppendState *mstate, List *ancestors,
+ ExplainState *es);
static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
ExplainState *es);
static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1602,6 +1604,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_sort_keys(castNode(SortState, planstate), ancestors, es);
show_sort_info(castNode(SortState, planstate), es);
break;
+ case T_Append:
+ show_append_keys(castNode(AppendState, planstate),
+ ancestors, es);
+ break;
case T_MergeAppend:
show_merge_append_keys(castNode(MergeAppendState, planstate),
ancestors, es);
@@ -1744,7 +1750,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
ancestors, es);
break;
case T_MergeAppend:
- ExplainMemberNodes(((MergeAppend *) plan)->mergeplans,
+ ExplainMemberNodes(((MergeAppend*) plan)->plan.appendplans,
((MergeAppendState *) planstate)->mergeplans,
ancestors, es);
break;
@@ -1935,6 +1941,22 @@ show_sort_keys(SortState *sortstate, List *ancestors, ExplainState *es)
ancestors, es);
}
+/*
+ * Likewise, for an Append node.
+ */
+static void
+show_append_keys(AppendState *mstate, List *ancestors,
+ ExplainState *es)
+{
+ Append *plan = (Append *) mstate->ps.plan;
+
+ show_sort_group_keys((PlanState *) mstate, "Sort Key",
+ plan->numCols, plan->sortColIdx,
+ plan->sortOperators, plan->collations,
+ plan->nullsFirst,
+ ancestors, es);
+}
+
/*
* Likewise, for a MergeAppend node.
*/
@@ -1942,7 +1964,7 @@ static void
show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
ExplainState *es)
{
- MergeAppend *plan = (MergeAppend *) mstate->ps.plan;
+ Append *plan = (Append *) mstate->ps.plan;
show_sort_group_keys((PlanState *) mstate, "Sort Key",
plan->numCols, plan->sortColIdx,
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 6bf490bd70..601f2547d3 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -64,6 +64,7 @@ MergeAppendState *
ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
MergeAppendState *mergestate = makeNode(MergeAppendState);
+ Append *append = &node->plan;
PlanState **mergeplanstates;
int nplans;
int i;
@@ -76,12 +77,12 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* Lock the non-leaf tables in the partition tree controlled by this node.
* It's a no-op for non-partitioned parent tables.
*/
- ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
+ ExecLockNonLeafAppendTables(append->partitioned_rels, estate);
/*
* Set up empty vector of subplan states
*/
- nplans = list_length(node->mergeplans);
+ nplans = list_length(append->appendplans);
mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
@@ -116,7 +117,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* results into the array "mergeplans".
*/
i = 0;
- foreach(lc, node->mergeplans)
+ foreach(lc, append->appendplans)
{
Plan *initNode = (Plan *) lfirst(lc);
@@ -133,17 +134,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/*
* initialize sort-key information
*/
- mergestate->ms_nkeys = node->numCols;
- mergestate->ms_sortkeys = palloc0(sizeof(SortSupportData) * node->numCols);
+ mergestate->ms_nkeys = append->numCols;
+ mergestate->ms_sortkeys = palloc0(sizeof(SortSupportData) * append->numCols);
- for (i = 0; i < node->numCols; i++)
+ for (i = 0; i < append->numCols; i++)
{
SortSupport sortKey = mergestate->ms_sortkeys + i;
sortKey->ssup_cxt = CurrentMemoryContext;
- sortKey->ssup_collation = node->collations[i];
- sortKey->ssup_nulls_first = node->nullsFirst[i];
- sortKey->ssup_attno = node->sortColIdx[i];
+ sortKey->ssup_collation = append->collations[i];
+ sortKey->ssup_nulls_first = append->nullsFirst[i];
+ sortKey->ssup_attno = append->sortColIdx[i];
/*
* It isn't feasible to perform abbreviated key conversion, since
@@ -154,7 +155,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
sortKey->abbreviate = false;
- PrepareSortSupportFromOrderingOp(node->sortOperators[i], sortKey);
+ PrepareSortSupportFromOrderingOp(append->sortOperators[i], sortKey);
}
/*
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f1bed14e2b..23dd6043be 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -224,6 +224,19 @@ _copyModifyTable(const ModifyTable *from)
return newnode;
}
+static void
+copyAppendFields(const Append *from, Append *newnode)
+{
+ CopyPlanFields((const Plan *) from, (Plan *) newnode);
+ COPY_NODE_FIELD(partitioned_rels);
+ COPY_NODE_FIELD(appendplans);
+ COPY_SCALAR_FIELD(numCols);
+ COPY_POINTER_FIELD(sortColIdx, from->numCols * sizeof(AttrNumber));
+ COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
+ COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
+ COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
+}
+
/*
* _copyAppend
*/
@@ -232,16 +245,7 @@ _copyAppend(const Append *from)
{
Append *newnode = makeNode(Append);
- /*
- * copy node superclass fields
- */
- CopyPlanFields((const Plan *) from, (Plan *) newnode);
-
- /*
- * copy remainder of node
- */
- COPY_NODE_FIELD(partitioned_rels);
- COPY_NODE_FIELD(appendplans);
+ copyAppendFields(from, newnode);
return newnode;
}
@@ -254,21 +258,7 @@ _copyMergeAppend(const MergeAppend *from)
{
MergeAppend *newnode = makeNode(MergeAppend);
- /*
- * copy node superclass fields
- */
- CopyPlanFields((const Plan *) from, (Plan *) newnode);
-
- /*
- * copy remainder of node
- */
- COPY_NODE_FIELD(partitioned_rels);
- COPY_NODE_FIELD(mergeplans);
- COPY_SCALAR_FIELD(numCols);
- COPY_POINTER_FIELD(sortColIdx, from->numCols * sizeof(AttrNumber));
- COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
- COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
- COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
+ copyAppendFields((const Append *)from, (Append *)newnode);
return newnode;
}
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index e3eb0c5788..ccbf11d72e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3735,7 +3735,7 @@ planstate_tree_walker(PlanState *planstate,
return true;
break;
case T_MergeAppend:
- if (planstate_walk_members(((MergeAppend *) plan)->mergeplans,
+ if (planstate_walk_members(((MergeAppend *) plan)->plan.appendplans,
((MergeAppendState *) planstate)->mergeplans,
walker, context))
return true;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b83d919e40..47f276d9f5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -386,27 +386,14 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
}
static void
-_outAppend(StringInfo str, const Append *node)
+_outAppendInfo(StringInfo str, const Append *node)
{
- WRITE_NODE_TYPE("APPEND");
+ int i;
_outPlanInfo(str, (const Plan *) node);
WRITE_NODE_FIELD(partitioned_rels);
WRITE_NODE_FIELD(appendplans);
-}
-
-static void
-_outMergeAppend(StringInfo str, const MergeAppend *node)
-{
- int i;
-
- WRITE_NODE_TYPE("MERGEAPPEND");
-
- _outPlanInfo(str, (const Plan *) node);
-
- WRITE_NODE_FIELD(partitioned_rels);
- WRITE_NODE_FIELD(mergeplans);
WRITE_INT_FIELD(numCols);
@@ -427,6 +414,20 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
appendStringInfo(str, " %s", booltostr(node->nullsFirst[i]));
}
+static void
+_outAppend(StringInfo str, const Append *node)
+{
+ WRITE_NODE_TYPE("APPEND");
+ _outAppendInfo(str, node);
+}
+
+static void
+_outMergeAppend(StringInfo str, const MergeAppend *node)
+{
+ WRITE_NODE_TYPE("MERGEAPPEND");
+ _outAppendInfo(str, (const Append *) &node->plan);
+}
+
static void
_outRecursiveUnion(StringInfo str, const RecursiveUnion *node)
{
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index fbf8330735..270b60041c 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1582,18 +1582,31 @@ _readModifyTable(void)
READ_DONE();
}
+static void
+ReadCommonAppend(Append* local_node)
+{
+ READ_TEMP_LOCALS();
+
+ ReadCommonPlan(&local_node->plan);
+
+ READ_NODE_FIELD(partitioned_rels);
+ READ_NODE_FIELD(appendplans);
+ READ_INT_FIELD(numCols);
+ READ_ATTRNUMBER_ARRAY(sortColIdx, local_node->numCols);
+ READ_OID_ARRAY(sortOperators, local_node->numCols);
+ READ_OID_ARRAY(collations, local_node->numCols);
+ READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
+}
+
/*
* _readAppend
*/
static Append *
_readAppend(void)
{
- READ_LOCALS(Append);
+ READ_LOCALS_NO_FIELDS(Append);
- ReadCommonPlan(&local_node->plan);
-
- READ_NODE_FIELD(partitioned_rels);
- READ_NODE_FIELD(appendplans);
+ ReadCommonAppend(local_node);
READ_DONE();
}
@@ -1604,17 +1617,9 @@ _readAppend(void)
static MergeAppend *
_readMergeAppend(void)
{
- READ_LOCALS(MergeAppend);
-
- ReadCommonPlan(&local_node->plan);
+ READ_LOCALS_NO_FIELDS(MergeAppend);
- READ_NODE_FIELD(partitioned_rels);
- READ_NODE_FIELD(mergeplans);
- READ_INT_FIELD(numCols);
- READ_ATTRNUMBER_ARRAY(sortColIdx, local_node->numCols);
- READ_OID_ARRAY(sortOperators, local_node->numCols);
- READ_OID_ARRAY(collations, local_node->numCols);
- READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
+ ReadCommonAppend(&local_node->plan);
READ_DONE();
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a7866a99e0..f62fc5488a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -94,6 +94,10 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static int indexof_path_relation_in_oid_list(Oid oid, Oid* sorted_oids,
+ int array_size);
+static void generate_sorted_append_paths(PlannerInfo *root,
+ RelOptInfo *parent_rel, RelOptInfo **ordered_childs);
static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys,
@@ -134,7 +138,7 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
RangeTblEntry *rte, Index rti, Node *qual);
static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
- List *live_childrels);
+ List *live_childrels, RelOptInfo **ordered_partition_rels);
/*
@@ -1215,12 +1219,27 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
int parentRTindex = rti;
List *live_childrels = NIL;
ListCell *l;
+ bool is_partitioned = rel->rel_sorted_part_oids != NIL;
+ int num_parts = list_length(rel->rel_sorted_part_oids);
+ RelOptInfo **ordered_partition_rels = palloc0(sizeof(RelOptInfo*) * num_parts);
+ Oid *parts_oids = palloc0(sizeof(Oid) * num_parts);
+
+ if (is_partitioned)
+ {
+ ListCell *lc;
+ int i = 0;
+ foreach (lc, rel->rel_sorted_part_oids)
+ {
+ parts_oids[i] = lfirst_oid(lc);
+ i++;
+ }
+ }
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
*/
- foreach(l, root->append_rel_list)
+ foreach (l, root->append_rel_list)
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
int childRTindex;
@@ -1260,10 +1279,21 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
* Child is live, so add it to the live_childrels list for use below.
*/
live_childrels = lappend(live_childrels, childrel);
+
+ if (is_partitioned) {
+ int partindex =
+ indexof_path_relation_in_oid_list(childRTE->relid,
+ parts_oids, num_parts);
+
+ ordered_partition_rels[partindex] = childrel;
+ }
}
/* Add paths to the "append" relation. */
- add_paths_to_append_rel(root, rel, live_childrels);
+ add_paths_to_append_rel(root, rel, live_childrels, ordered_partition_rels);
+
+ pfree(parts_oids);
+ pfree(ordered_partition_rels);
}
@@ -1280,7 +1310,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
*/
static void
add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
- List *live_childrels)
+ List *live_childrels,
+ RelOptInfo **ordered_partition_rels)
{
List *subpaths = NIL;
bool subpaths_valid = true;
@@ -1292,6 +1323,7 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *partitioned_rels = NIL;
RangeTblEntry *rte;
bool build_partitioned_rels = false;
+ bool is_partitioned = rel->rel_sorted_part_oids != NIL;
/*
* A root partition will already have a PartitionedChildRelInfo, and a
@@ -1431,8 +1463,15 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
* if we have zero or one live subpath due to constraint exclusion.)
*/
if (subpaths_valid)
- add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
- partitioned_rels));
+ add_path(rel, (Path *) create_append_path(root, rel, subpaths, NULL, 0,
+ partitioned_rels, NIL));
+
+ /*
+ * If possible, build ordered append path matching the PathKeys derived
+ * from the partition key
+ */
+ if (is_partitioned)
+ generate_sorted_append_paths(root, rel, ordered_partition_rels);
/*
* Consider an append of partial unordered, unparameterized partial paths.
@@ -1458,8 +1497,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Assert(parallel_workers > 0);
/* Generate a partial append path. */
- appendpath = create_append_path(rel, partial_subpaths, NULL,
- parallel_workers, partitioned_rels);
+ appendpath = create_append_path(root, rel, partial_subpaths, NULL,
+ parallel_workers, partitioned_rels, NIL);
add_partial_path(rel, (Path *) appendpath);
}
@@ -1512,8 +1551,127 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
if (subpaths_valid)
add_path(rel, (Path *)
- create_append_path(rel, subpaths, required_outer, 0,
- partitioned_rels));
+ create_append_path(root, rel, subpaths, required_outer, 0,
+ partitioned_rels, NIL));
+ }
+}
+
+static int
+indexof_path_relation_in_oid_list(Oid oid, Oid* sorted_oids, int array_size)
+{
+ int i;
+
+ for(i=0; i < array_size; i++)
+ {
+ if(sorted_oids[i] == oid)
+ return i;
+ }
+
+ return -1;
+}
+
+
+static void
+generate_sorted_append_paths(PlannerInfo *root, RelOptInfo *parent_rel, RelOptInfo **ordered_childs)
+{
+ ListCell *lc;
+ int i;
+ List *partitions_asc = NIL;
+ List *partitions_desc = NIL;
+ RangeTblEntry * parent_rte = planner_rt_fetch(parent_rel->relid, root);
+
+ for (i = 0; i < list_length(parent_rel->rel_sorted_part_oids); i++)
+ {
+ partitions_asc = lappend(partitions_asc, ordered_childs[i]);
+ partitions_desc = lcons(ordered_childs[i], partitions_desc);
+ }
+
+ if (parent_rte->relkind != RELKIND_PARTITIONED_TABLE)
+ return;
+
+ foreach(lc, parent_rel->rel_partitioned_pathkeys)
+ {
+ List *pathkeys = (List *) lfirst(lc);
+ PathKey *first = (PathKey *) linitial(pathkeys);
+ List *ordered_partitions = first->pk_strategy == BTLessStrategyNumber ?
+ partitions_asc : partitions_desc;
+ List *startup_subpaths = NIL;
+ List *total_subpaths = NIL;
+ List *sequential_subpaths = NIL;
+ bool startup_neq_total = false;
+ ListCell *lc2;
+
+ if (compare_pathkeys(pathkeys, root->query_pathkeys) == PATHKEYS_DIFFERENT)
+ continue;
+
+ foreach (lc2, ordered_partitions)
+ {
+ RelOptInfo *childrel = lfirst(lc2);
+ Path *cheapest_startup,
+ *cheapest_total,
+ *sequential = NULL;
+
+ /* The partition may have been pruned */
+ if (!childrel)
+ continue;
+
+ //Assert(pathkeys_contained_in(pathkeys, root->query_pathkeys));
+
+ cheapest_startup = get_cheapest_path_for_pathkeys(childrel->pathlist,
+ root->query_pathkeys,
+ NULL,
+ STARTUP_COST,
+ false);
+ cheapest_total = get_cheapest_path_for_pathkeys(childrel->pathlist,
+ root->query_pathkeys,
+ NULL,
+ TOTAL_COST,
+ false);
+
+ /*
+ * If we can't find any paths with the right order just use the
+ * cheapest-total path; we'll have to sort it later.
+ */
+ if (cheapest_startup == NULL || cheapest_total == NULL)
+ {
+ cheapest_startup = cheapest_total =
+ childrel->cheapest_total_path;
+ /* Assert we do have an unparameterized path for this child */
+ Assert(cheapest_total->param_info == NULL);
+ }
+ /*
+ * Force a an unordered path, which could be cheaper in corner cases where
+ * orderedpaths are too expensive.
+ */
+ sequential = childrel->cheapest_total_path;
+
+ /*
+ * Notice whether we actually have different paths for the
+ * "cheapest" and "total" cases; frequently there will be no point
+ * in two create_merge_append_path() calls.
+ */
+ if (cheapest_startup != cheapest_total)
+ startup_neq_total = true;
+ startup_subpaths =
+ lappend(startup_subpaths, cheapest_startup);
+ total_subpaths =
+ lappend(total_subpaths, cheapest_total);
+ sequential_subpaths =
+ lappend(sequential_subpaths, sequential);
+
+ }
+ if (startup_subpaths)
+ {
+ add_path(parent_rel, (Path *) create_append_path(root, parent_rel, startup_subpaths,
+ NULL, 0, NIL, root->query_pathkeys));
+ }
+ if (startup_neq_total)
+ add_path(parent_rel, (Path *) create_append_path(root,
+ parent_rel, total_subpaths, NULL, 0, NIL, root->query_pathkeys));
+ if (sequential_subpaths){
+ add_path(parent_rel, (Path *) create_append_path(root,
+ parent_rel, sequential_subpaths, NULL, 0, NIL, root->query_pathkeys));
+ }
}
}
@@ -1749,7 +1907,7 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
rel->pathlist = NIL;
rel->partial_pathlist = NIL;
- add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+ add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NULL, 0, NIL, NIL));
/*
* We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6ee23509c5..9b201ecdda 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,7 @@ mark_dummy_rel(RelOptInfo *rel)
rel->partial_pathlist = NIL;
/* Set up the dummy path */
- add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+ add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NULL, 0, NIL, NIL));
/* Set or update cheapest_total_path and related fields */
set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 28216629aa..e063128251 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -78,8 +78,8 @@ static List *get_gating_quals(PlannerInfo *root, List *quals);
static Plan *create_gating_plan(PlannerInfo *root, Path *path, Plan *plan,
List *gating_quals);
static Plan *create_join_plan(PlannerInfo *root, JoinPath *best_path);
-static Plan *create_append_plan(PlannerInfo *root, AppendPath *best_path);
-static Plan *create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path);
+static Plan *create_append_plan(NodeTag node_type, PlannerInfo *root, AppendPath *best_path);
+static Plan *wrap_sort(PlannerInfo *root, Append* parentplan, Path *subpath, List* pathkeys, double limit_tuples);
static Result *create_result_plan(PlannerInfo *root, ResultPath *best_path);
static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
@@ -203,7 +203,7 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
Index scanrelid, char *enrname);
static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(NodeTag node_type, List *tlist, List *partitioned_rels);
static RecursiveUnion *make_recursive_union(List *tlist,
Plan *lefttree,
Plan *righttree,
@@ -381,12 +381,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
(JoinPath *) best_path);
break;
case T_Append:
- plan = create_append_plan(root,
- (AppendPath *) best_path);
- break;
case T_MergeAppend:
- plan = create_merge_append_plan(root,
- (MergeAppendPath *) best_path);
+ plan = create_append_plan(best_path->pathtype, root,
+ (AppendPath *) best_path);
break;
case T_Result:
if (IsA(best_path, ProjectionPath))
@@ -999,13 +996,17 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path)
* Returns a Plan node.
*/
static Plan *
-create_append_plan(PlannerInfo *root, AppendPath *best_path)
+create_append_plan(NodeTag node_type, PlannerInfo *root, AppendPath *best_path)
{
- Append *plan;
+ Append *node;
+ Plan *plan;
List *tlist = build_path_tlist(root, &best_path->path);
+ List *pathkeys = best_path->path.pathkeys;
List *subplans = NIL;
ListCell *subpaths;
+ double limit_tuples = best_path->limit_tuples;
+
/*
* The subpaths list could be empty, if every child was proven empty by
* constraint exclusion. In that case generate a dummy plan that returns
@@ -1017,9 +1018,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
*/
if (best_path->subpaths == NIL)
{
- /* Generate a Result plan with constant-FALSE gating qual */
- Plan *plan;
-
plan = (Plan *) make_result(tlist,
(Node *) list_make1(makeBoolConst(false,
false)),
@@ -1030,62 +1028,11 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
return plan;
}
- /* Build the plan for each child */
- foreach(subpaths, best_path->subpaths)
- {
- Path *subpath = (Path *) lfirst(subpaths);
- Plan *subplan;
-
- /* Must insist that all children return the same tlist */
- subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
-
- subplans = lappend(subplans, subplan);
- }
-
- /*
- * XXX ideally, if there's just one child, we'd not bother to generate an
- * Append node but just return the single child. At the moment this does
- * not work because the varno of the child scan plan won't match the
- * parent-rel Vars it'll be asked to emit.
- */
-
- plan = make_append(subplans, tlist, best_path->partitioned_rels);
-
- copy_generic_path_info(&plan->plan, (Path *) best_path);
-
- return (Plan *) plan;
-}
-
-/*
- * create_merge_append_plan
- * Create a MergeAppend plan for 'best_path' and (recursively) plans
- * for its subpaths.
- *
- * Returns a Plan node.
- */
-static Plan *
-create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
-{
- MergeAppend *node = makeNode(MergeAppend);
- Plan *plan = &node->plan;
- List *tlist = build_path_tlist(root, &best_path->path);
- List *pathkeys = best_path->path.pathkeys;
- List *subplans = NIL;
- ListCell *subpaths;
-
- /*
- * We don't have the actual creation of the MergeAppend node split out
- * into a separate make_xxx function. This is because we want to run
- * prepare_sort_from_pathkeys on it before we do so on the individual
- * child plans, to make cross-checking the sort info easier.
- */
+ node = make_append(node_type, tlist, best_path->partitioned_rels);
+ plan = &node->plan;
copy_generic_path_info(plan, (Path *) best_path);
plan->targetlist = tlist;
- plan->qual = NIL;
- plan->lefttree = NULL;
- plan->righttree = NULL;
- /* Compute sort column info, and adjust MergeAppend's tlist as needed */
(void) prepare_sort_from_pathkeys(plan, pathkeys,
best_path->path.parent->relids,
NULL,
@@ -1096,70 +1043,18 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
&node->collations,
&node->nullsFirst);
- /*
- * Now prepare the child plans. We must apply prepare_sort_from_pathkeys
- * even to subplans that don't need an explicit sort, to make sure they
- * are returning the same sort key columns the MergeAppend expects.
- */
- foreach(subpaths, best_path->subpaths)
+ /* Build the plan for each child */
+ foreach(subpaths, best_path->subpaths)
{
Path *subpath = (Path *) lfirst(subpaths);
- Plan *subplan;
- int numsortkeys;
- AttrNumber *sortColIdx;
- Oid *sortOperators;
- Oid *collations;
- bool *nullsFirst;
-
- /* Build the child plan */
- /* Must insist that all children return the same tlist */
- subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
-
- /* Compute sort column info, and adjust subplan's tlist as needed */
- subplan = prepare_sort_from_pathkeys(subplan, pathkeys,
- subpath->parent->relids,
- node->sortColIdx,
- false,
- &numsortkeys,
- &sortColIdx,
- &sortOperators,
- &collations,
- &nullsFirst);
-
- /*
- * Check that we got the same sort key information. We just Assert
- * that the sortops match, since those depend only on the pathkeys;
- * but it seems like a good idea to check the sort column numbers
- * explicitly, to ensure the tlists really do match up.
- */
- Assert(numsortkeys == node->numCols);
- if (memcmp(sortColIdx, node->sortColIdx,
- numsortkeys * sizeof(AttrNumber)) != 0)
- elog(ERROR, "MergeAppend child's targetlist doesn't match MergeAppend");
- Assert(memcmp(sortOperators, node->sortOperators,
- numsortkeys * sizeof(Oid)) == 0);
- Assert(memcmp(collations, node->collations,
- numsortkeys * sizeof(Oid)) == 0);
- Assert(memcmp(nullsFirst, node->nullsFirst,
- numsortkeys * sizeof(bool)) == 0);
-
- /* Now, insert a Sort node if subplan isn't sufficiently ordered */
- if (!pathkeys_contained_in(pathkeys, subpath->pathkeys))
- {
- Sort *sort = make_sort(subplan, numsortkeys,
- sortColIdx, sortOperators,
- collations, nullsFirst);
-
- label_sort_with_costsize(root, sort, best_path->limit_tuples);
- subplan = (Plan *) sort;
- }
+ /* TODO: decrease limit tuples for each subpath */
+ Plan *subplan = plan = wrap_sort(root, node, subpath, pathkeys, limit_tuples);
+ if(limit_tuples > 0)
+ limit_tuples = Max(1, limit_tuples - subpath->rows);
subplans = lappend(subplans, subplan);
}
-
- node->partitioned_rels = best_path->partitioned_rels;
- node->mergeplans = subplans;
-
+ node->appendplans = subplans;
return (Plan *) node;
}
@@ -1566,7 +1461,7 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
* anyway. Usually create_projection_path will have detected that and set
* dummypp if we don't need a Result; but its decision can't be final,
* because some createplan.c routines change the tlists of their nodes.
- * (An example is that create_merge_append_plan might add resjunk sort
+ * (An example is that create_append_plan might add resjunk sort
* columns to a MergeAppend.) So we have to recheck here. If we do
* arrive at a different answer than create_projection_path did, we'll
* have made slightly wrong cost estimates; but label the plan with the
@@ -5274,21 +5169,100 @@ make_foreignscan(List *qptlist,
}
static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(NodeTag node_type, List *tlist, List *partitioned_rels)
{
- Append *node = makeNode(Append);
- Plan *plan = &node->plan;
+ Append *node;
+ Plan *plan;
- plan->targetlist = tlist;
+ if (node_type != T_Append && node_type != T_MergeAppend)
+ elog(ERROR, "create_append_plan can only create Append or MergeAppend plans");
+
+ switch(node_type)
+ {
+ case T_Append:
+ node = makeNode(Append);
+ break;
+ case T_MergeAppend:
+ node = (Append *) makeNode(MergeAppend);
+ break;
+ default:
+ elog(ERROR, "create_append_plan can only create Append or MergeAppend plans");
+ }
+
+ plan = &node->plan; plan->targetlist = tlist;
plan->qual = NIL;
plan->lefttree = NULL;
plan->righttree = NULL;
node->partitioned_rels = partitioned_rels;
- node->appendplans = appendplans;
+ node->appendplans = NIL;
+ node->numCols = 0;
+ node->sortColIdx = NULL;
+ node->sortOperators = NULL;
+ node->collations = NULL;
+ node->nullsFirst = NULL;
return node;
}
+static Plan *
+wrap_sort(PlannerInfo *root, Append* parentplan, Path *subpath, List* pathkeys, double limit_tuples)
+{
+ int numCols;
+ AttrNumber *sortColIdx;
+ Oid *sortOperators;
+ Oid *collations;
+ bool *nullsFirst;
+ Plan *subplan;
+
+ /* Build the child plan */
+ /* Must insist that all children return the same tlist */
+ subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
+
+ if (pathkeys != NIL)
+ {
+ /* Compute sort column info, and adjust subplan's tlist as needed */
+ subplan = prepare_sort_from_pathkeys(subplan, pathkeys,
+ subpath->parent->relids,
+ parentplan->sortColIdx,
+ false,
+ &numCols,
+ &sortColIdx,
+ &sortOperators,
+ &collations,
+ &nullsFirst);
+
+ /*
+ * Check that we got the same sort key information. We just Assert
+ * that the sortops match, since those depend only on the pathkeys;
+ * but it seems like a good idea to check the sort column numbers
+ * explicitly, to ensure the tlists really do match up.
+ */
+ Assert(numCols == parentplan->numCols);
+ if (memcmp(sortColIdx, parentplan->sortColIdx,
+ numCols * sizeof(AttrNumber)) != 0)
+ elog(ERROR, "MergeAppend child's targetlist doesn't match MergeAppend");
+ Assert(memcmp(sortOperators, parentplan->sortOperators,
+ numCols * sizeof(Oid)) == 0);
+ Assert(memcmp(collations, parentplan->collations,
+ numCols * sizeof(Oid)) == 0);
+ Assert(memcmp(nullsFirst, parentplan->nullsFirst,
+ numCols * sizeof(bool)) == 0);
+
+ /* Now, insert a Sort node if subplan isn't sufficiently ordered */
+ if (!pathkeys_contained_in(pathkeys, subpath->pathkeys))
+ {
+ Sort *sort = make_sort(subplan, numCols,
+ sortColIdx, sortOperators,
+ collations, nullsFirst);
+
+ label_sort_with_costsize(root, sort, limit_tuples);
+ subplan = (Plan *) sort;
+ }
+ }
+
+ return subplan;
+}
+
static RecursiveUnion *
make_recursive_union(List *tlist,
Plan *lefttree,
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index f4e0a6ea3d..acf87cc917 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -25,6 +25,7 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
#include "optimizer/planmain.h"
@@ -176,6 +177,13 @@ query_planner(PlannerInfo *root, List *tlist,
*/
(*qp_callback) (root, qp_extra);
+ /*
+ * We consider generating pathkeys for partitioned tables only if the
+ * query has some ordering
+ */
+ if (root->query_pathkeys != NIL)
+ generate_pathkeys_for_partitioned_tables(root);
+
/*
* Examine any "placeholder" expressions generated during subquery pullup.
* Make sure that the Vars they need are marked as needed at the relevant
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7f146d670c..175088313b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3643,10 +3643,12 @@ create_grouping_paths(PlannerInfo *root,
paths = lappend(paths, path);
}
path = (Path *)
- create_append_path(grouped_rel,
+ create_append_path(root,
+ grouped_rel,
paths,
NULL,
0,
+ NIL,
NIL);
path->pathtarget = target;
}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b0c9e94459..c9280af5c1 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -932,12 +932,12 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
* targetlist or check quals.
*/
set_dummy_tlist_references(plan, rtoffset);
- Assert(splan->plan.qual == NIL);
- foreach(l, splan->partitioned_rels)
+ Assert(splan->plan.plan.qual == NIL);
+ foreach(l, splan->plan.partitioned_rels)
{
lfirst_int(l) += rtoffset;
}
- foreach(l, splan->mergeplans)
+ foreach(l, splan->plan.appendplans)
{
lfirst(l) = set_plan_refs(root,
(Plan *) lfirst(l),
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 1103984779..19fc190f7d 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2589,7 +2589,7 @@ finalize_plan(PlannerInfo *root, Plan *plan,
{
ListCell *l;
- foreach(l, ((MergeAppend *) plan)->mergeplans)
+ foreach(l, ((MergeAppend *) plan)->plan.appendplans)
{
context.paramids =
bms_add_members(context.paramids,
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 3e0c3de86d..fc615a4314 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -590,7 +590,7 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
/*
* Append the child results together.
*/
- path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+ path = (Path *) create_append_path(root, result_rel, pathlist, NULL, 0, NIL, NIL);
/* We have to manually jam the right tlist into the path; ick */
path->pathtarget = create_pathtarget(root, tlist);
@@ -702,7 +702,7 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
/*
* Append the child results together.
*/
- path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+ path = (Path *) create_append_path(root, result_rel, pathlist, NULL, 0, NIL, NIL);
/* We have to manually jam the right tlist into the path; ick */
path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 26567cb7f6..33906d3d64 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1200,11 +1200,21 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
* Note that we must handle subpaths = NIL, representing a dummy access path.
*/
AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
- int parallel_workers, List *partitioned_rels)
+create_append_path(PlannerInfo *root, RelOptInfo *rel, List *subpaths,
+ Relids required_outer, int parallel_workers,
+ List *partitioned_rels, List *pathkeys)
{
AppendPath *pathnode = makeNode(AppendPath);
ListCell *l;
+ double limit_tuples;
+ Cost input_startup_cost;
+ Cost input_total_cost;
+
+ /*
+ * Just to make sure that nobody is trying to build a parallel sorted
+ * append path
+ */
+ Assert(parallel_workers > 0 ? pathkeys == NIL : true);
pathnode->path.pathtype = T_Append;
pathnode->path.parent = rel;
@@ -1214,7 +1224,7 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel;
pathnode->path.parallel_workers = parallel_workers;
- pathnode->path.pathkeys = NIL; /* result is always considered unsorted */
+ pathnode->path.pathkeys = pathkeys;
pathnode->partitioned_rels = list_copy(partitioned_rels);
pathnode->subpaths = subpaths;
@@ -1229,15 +1239,47 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
pathnode->path.rows = 0;
pathnode->path.startup_cost = 0;
pathnode->path.total_cost = 0;
+
+ if (root && bms_equal(rel->relids, root->all_baserels))
+ pathnode->limit_tuples = root->limit_tuples;
+ else
+ pathnode->limit_tuples = -1.0;
+
+ limit_tuples = pathnode->limit_tuples;
+
foreach(l, subpaths)
{
Path *subpath = (Path *) lfirst(l);
+ input_startup_cost = subpath->startup_cost;
+ input_total_cost = subpath->total_cost;
+
+ if(pathkeys && !pathkeys_contained_in(pathkeys, subpath->pathkeys))
+ {
+ Path sort_path; /* dummy for result of cost_sort */
+
+ cost_sort(&sort_path,
+ root,
+ pathkeys,
+ subpath->total_cost,
+ subpath->rows,
+ subpath->pathtarget->width,
+ 0.0,
+ work_mem,
+ limit_tuples);
+ input_startup_cost += sort_path.startup_cost;
+ input_total_cost = sort_path.total_cost;
+ }
pathnode->path.rows += subpath->rows;
if (l == list_head(subpaths)) /* first node? */
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost += subpath->total_cost;
+ pathnode->path.startup_cost = input_startup_cost;
+ /* Consider that the number of tuples to be fetched decreases
+ * for every subsequent partition */
+ if(limit_tuples > 0)
+ limit_tuples = Max(1, limit_tuples - subpath->rows);
+
+ pathnode->path.total_cost += input_total_cost;
pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index a1ebd4acc8..b4f7a3d932 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -35,6 +35,7 @@
#include "nodes/makefuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
+#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/predtest.h"
#include "optimizer/prep.h"
@@ -1258,6 +1259,148 @@ get_relation_constraints(PlannerInfo *root,
return result;
}
+/*
+ * Generate pathkeys for Range-based partitions
+ */
+void
+generate_pathkeys_for_partitioned_tables(PlannerInfo *root)
+{
+ int i;
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[i];
+
+ /* Only base relation can be partitionned */
+ if (rel && has_useful_pathkeys(root, rel))
+ {
+ Index varno = rel->relid;
+ Index parent_varno = varno;
+ RelOptInfo *parent_rel = rel;
+ Relation relation;
+ RangeTblEntry *rte = planner_rt_fetch(varno, root);
+
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ continue;
+
+ while (parent_rel->reloptkind != RELOPT_BASEREL)
+ {
+ ListCell *lc;
+
+ foreach(lc, root->append_rel_list)
+ {
+ AppendRelInfo *ari = lfirst(lc);
+
+ if (ari->child_relid == parent_rel->relid)
+ {
+ parent_rel = root->simple_rel_array[ari->parent_relid];
+ break;
+ }
+
+ /* Should never happen: we should always be able to climb up the
+ * inheritance tree */
+ if(!lc)
+ elog(ERROR, "Unable to find parent table for child ");
+ }
+ }
+ parent_varno = parent_rel->relid;
+
+ /*
+ * Ignore base rel if it's not a partitionned table. We'll still
+ * have to open it to verify if it's a range partitionned table
+ */
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ continue;
+
+ relation = heap_open(rte->relid, NoLock);
+
+ /*
+ * Store the child partitions OIDs and build pathkeys for the
+ * partitioning keys
+ */
+ if(relation->rd_partkey->strategy == PARTITION_STRATEGY_RANGE)
+ {
+ int j;
+ ListCell *lc;
+ Oid equality_op;
+ EquivalenceClass *ec;
+ List *opfamilies;
+ List *asc_pathkeys = NIL;
+ List *desc_pathkeys = NIL;
+
+ Assert(rel->rel_sorted_part_oids == NIL);
+ Assert(rel->rel_partitioned_pathkeys == NIL);
+
+ for (j = 0; j < relation->rd_partdesc->nparts; j++)
+ {
+ rel->rel_sorted_part_oids =
+ lappend_oid(rel->rel_sorted_part_oids,
+ relation->rd_partdesc->oids[j]);
+ }
+
+ /* Lookup individual vars from the pathtarget */
+ /* FIXME: refactor this in an external function */
+ lc = list_head(relation->rd_partkey->partexprs);
+ for (j=0; j < relation->rd_partkey->partnatts; j++)
+ {
+ AttrNumber attno = relation->rd_partkey->partattrs[j];
+ Expr *expr = NULL;
+
+ /* This is not an attribute, but an expression */
+ if(attno == InvalidAttrNumber)
+ {
+ /* Should never append : we should be able to fetch
+ * an expression for anything in the partition key */
+ if (!lc)
+ elog(ERROR, "Could not find expression for partition key");
+ expr = lfirst(lc);
+ lc = lnext(lc);
+ }
+ else
+ {
+ expr = (Expr*) makeVar(parent_varno, attno, relation->rd_partkey->parttypid[j],
+ relation->rd_partkey->parttypmod[j],
+ relation->rd_partkey->parttypcoll[j],
+ 0);
+ }
+
+ equality_op = get_opfamily_member(relation->rd_partkey->partopfamily[j],
+ relation->rd_partkey->partopcintype[j],
+ relation->rd_partkey->partopcintype[j],
+ BTEqualStrategyNumber);
+ opfamilies = get_mergejoin_opfamilies(equality_op);
+ ec = get_eclass_for_sort_expr(root, expr,
+ NULL, opfamilies,
+ relation->rd_partkey->partopcintype[j],
+ relation->rd_partkey->partcollation[j],
+ 0, rel->relids, true);
+ asc_pathkeys = lappend(asc_pathkeys, make_canonical_pathkey(root,
+ ec,
+ relation->rd_partkey->partopfamily[j],
+ BTLessStrategyNumber, false));
+ desc_pathkeys = lappend(desc_pathkeys, make_canonical_pathkey(root,
+ ec,
+ relation->rd_partkey->partopfamily[j],
+ BTGreaterStrategyNumber, true));
+ }
+
+ /* FIXME: this is as dirty as it gets */
+ if(list_length(asc_pathkeys) > list_length(root->query_pathkeys))
+ {
+ asc_pathkeys = truncate_useless_pathkeys(root, rel, asc_pathkeys);
+ desc_pathkeys = truncate_useless_pathkeys(root, rel, desc_pathkeys);
+ }
+
+ if(asc_pathkeys)
+ rel->rel_partitioned_pathkeys = lappend(rel->rel_partitioned_pathkeys, asc_pathkeys);
+
+ if(desc_pathkeys)
+ rel->rel_partitioned_pathkeys = lappend(rel->rel_partitioned_pathkeys, desc_pathkeys);
+ }
+ heap_close(relation, NoLock);
+ }
+ }
+}
+
/*
* get_relation_statistics
* Retrieve extended statistics defined on the table.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index c7b2695ebb..3a2f9e11de 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -146,6 +146,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->baserestrict_min_security = UINT_MAX;
rel->joininfo = NIL;
rel->has_eclass_joins = false;
+ rel->rel_partitioned_pathkeys = NIL;
+ rel->rel_sorted_part_oids = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a382331f41..283af54e09 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,17 @@ typedef struct Append
/* RT indexes of non-leaf tables in a partition tree */
List *partitioned_rels;
List *appendplans;
+ /* remaining fields are just like the sort-key info in struct Sort */
+ /* FIXME: We should either
+ * - define Append as sort + appendplans
+ * - define Append "like" sort and define MergeAppend as Append, only with
+ * a different tag
+ */
+ int numCols; /* number of sort-key columns */
+ AttrNumber *sortColIdx; /* their indexes in the target list */
+ Oid *sortOperators; /* OIDs of operators to sort them by */
+ Oid *collations; /* OIDs of collations */
+ bool *nullsFirst; /* NULLS FIRST/LAST directions */
} Append;
/* ----------------
@@ -257,16 +268,7 @@ typedef struct Append
*/
typedef struct MergeAppend
{
- Plan plan;
- /* RT indexes of non-leaf tables in a partition tree */
- List *partitioned_rels;
- List *mergeplans;
- /* remaining fields are just like the sort-key info in struct Sort */
- int numCols; /* number of sort-key columns */
- AttrNumber *sortColIdx; /* their indexes in the target list */
- Oid *sortOperators; /* OIDs of operators to sort them by */
- Oid *collations; /* OIDs of collations */
- bool *nullsFirst; /* NULLS FIRST/LAST directions */
+ Append plan;
} MergeAppend;
/* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d50ff55681..73d9851d23 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -567,6 +567,8 @@ typedef struct RelOptInfo
PlannerInfo *subroot; /* if subquery */
List *subplan_params; /* if subquery */
int rel_parallel_workers; /* wanted number of parallel workers */
+ List *rel_sorted_part_oids; /* if range partitioning */
+ List *rel_partitioned_pathkeys;
/* Information about foreign tables and foreign joins */
Oid serverid; /* identifies server for the table or join */
@@ -1178,6 +1180,7 @@ typedef struct AppendPath
/* RT indexes of non-leaf tables in a partition tree */
List *partitioned_rels;
List *subpaths; /* list of component Paths */
+ double limit_tuples; /* hard limit on output tuples, or -1 */
} AppendPath;
#define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e372f8862b..1496bdc0dc 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,9 +63,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
List *bitmapquals);
extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
- Relids required_outer, int parallel_workers,
- List *partitioned_rels);
+extern AppendPath *create_append_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *subpaths,
+ Relids required_outer,
+ int parallel_workers,
+ List *partitioned_rels,
+ List *pathkeys);
extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
RelOptInfo *rel,
List *subpaths,
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index 71f0faf938..6d96d54b5d 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -35,6 +35,8 @@ extern void estimate_rel_size(Relation rel, int32 *attr_widths,
extern int32 get_relation_data_width(Oid relid, int32 *attr_widths);
+extern void generate_pathkeys_for_partitioned_tables(PlannerInfo *root);
+
extern bool relation_excluded_by_constraints(PlannerInfo *root,
RelOptInfo *rel, RangeTblEntry *rte);
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..03d9cbb242 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1952,13 +1952,13 @@ explain (costs off) select min(a), max(a) from parted_minmax where b = '12345';
Result
InitPlan 1 (returns $0)
-> Limit
- -> Merge Append
+ -> Append
Sort Key: parted_minmax1.a
-> Index Only Scan using parted_minmax1i on parted_minmax1
Index Cond: ((a IS NOT NULL) AND (b = '12345'::text))
InitPlan 2 (returns $1)
-> Limit
- -> Merge Append
+ -> Append
Sort Key: parted_minmax1_1.a DESC
-> Index Only Scan Backward using parted_minmax1i on parted_minmax1 parted_minmax1_1
Index Cond: ((a IS NOT NULL) AND (b = '12345'::text))
On Thu, Sep 21, 2017 at 3:30 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Aug 31, 2017 at 5:30 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:On 3/20/17 11:03, Ronan Dunklau wrote:
Great idea. This is too late for v10 at this point, but please add it
to the next CommitFest so we don't forget about it.I know it is too late, and thought that it was too early to add it to the
commitfest properly since so many design decisions should be discussed. Thanks
for the feedback, I added it.This patch no longer applies. Please send an update.
Thanks for the reminder, and sorry for very very late answer. PFA
rebased patch on current head.I unfortunately didn't have time yet to read carefully the newly added
infrastructure (30833ba154e0c1106d61e3270242dc5999a3e4f3 and
following), so this patch is still using its own copy of partitions
list. I hope I can work on it shortly, but I prefer to send the
rebased version now, since it's still a POC with knowns problems and
questions, and I'll more than welcome any guidance with it.
With 9140cf8269b0c4ae002b2748d93979d535891311, we store the
RelOptInfos of partitions in the RelOptInfo of partitioned table. For
range partitioned table without default partition, they are arranged
in the ascending order of partition bounds. This patch may avoid
MergeAppend if the sort keys match partition keys by creating an
Append path by picking sorted paths from partition RelOptInfos in
order. You may use slightly modified version of
add_paths_to_append_rel().
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 21, 2017 at 10:52 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
With 9140cf8269b0c4ae002b2748d93979d535891311, we store the
RelOptInfos of partitions in the RelOptInfo of partitioned table. For
range partitioned table without default partition, they are arranged
in the ascending order of partition bounds. This patch may avoid
MergeAppend if the sort keys match partition keys by creating an
Append path by picking sorted paths from partition RelOptInfos in
order. You may use slightly modified version of
add_paths_to_append_rel().
Yes, I just saw this commit this morning, and this is exactly what I
was missing, thanks for the pointer and the patch!
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 21, 2017 at 11:13 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Sep 21, 2017 at 10:52 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:With 9140cf8269b0c4ae002b2748d93979d535891311, we store the
RelOptInfos of partitions in the RelOptInfo of partitioned table. For
range partitioned table without default partition, they are arranged
in the ascending order of partition bounds. This patch may avoid
MergeAppend if the sort keys match partition keys by creating an
Append path by picking sorted paths from partition RelOptInfos in
order. You may use slightly modified version of
add_paths_to_append_rel().Yes, I just saw this commit this morning, and this is exactly what I
was missing, thanks for the pointer and the patch!
PFA v3 of the patch, once again rebased and that now use all the newly
available infrastructure.
I also added a check that a sorted append can't be generated when a
default partition has been declared.
Attachments:
sorted_append_v3.difftext/plain; charset=US-ASCII; name=sorted_append_v3.diffDownload
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c1602c59cc..20e63b3204 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -80,6 +80,8 @@ static void show_upper_qual(List *qual, const char *qlabel,
ExplainState *es);
static void show_sort_keys(SortState *sortstate, List *ancestors,
ExplainState *es);
+static void show_append_keys(AppendState *mstate, List *ancestors,
+ ExplainState *es);
static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
ExplainState *es);
static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1602,6 +1604,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_sort_keys(castNode(SortState, planstate), ancestors, es);
show_sort_info(castNode(SortState, planstate), es);
break;
+ case T_Append:
+ show_append_keys(castNode(AppendState, planstate),
+ ancestors, es);
+ break;
case T_MergeAppend:
show_merge_append_keys(castNode(MergeAppendState, planstate),
ancestors, es);
@@ -1744,7 +1750,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
ancestors, es);
break;
case T_MergeAppend:
- ExplainMemberNodes(((MergeAppend *) plan)->mergeplans,
+ ExplainMemberNodes(((MergeAppend*) plan)->plan.appendplans,
((MergeAppendState *) planstate)->mergeplans,
ancestors, es);
break;
@@ -1935,6 +1941,22 @@ show_sort_keys(SortState *sortstate, List *ancestors, ExplainState *es)
ancestors, es);
}
+/*
+ * Likewise, for an Append node.
+ */
+static void
+show_append_keys(AppendState *mstate, List *ancestors,
+ ExplainState *es)
+{
+ Append *plan = (Append *) mstate->ps.plan;
+
+ show_sort_group_keys((PlanState *) mstate, "Sort Key",
+ plan->numCols, plan->sortColIdx,
+ plan->sortOperators, plan->collations,
+ plan->nullsFirst,
+ ancestors, es);
+}
+
/*
* Likewise, for a MergeAppend node.
*/
@@ -1942,7 +1964,7 @@ static void
show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
ExplainState *es)
{
- MergeAppend *plan = (MergeAppend *) mstate->ps.plan;
+ Append *plan = (Append *) mstate->ps.plan;
show_sort_group_keys((PlanState *) mstate, "Sort Key",
plan->numCols, plan->sortColIdx,
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 6bf490bd70..601f2547d3 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -64,6 +64,7 @@ MergeAppendState *
ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
MergeAppendState *mergestate = makeNode(MergeAppendState);
+ Append *append = &node->plan;
PlanState **mergeplanstates;
int nplans;
int i;
@@ -76,12 +77,12 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* Lock the non-leaf tables in the partition tree controlled by this node.
* It's a no-op for non-partitioned parent tables.
*/
- ExecLockNonLeafAppendTables(node->partitioned_rels, estate);
+ ExecLockNonLeafAppendTables(append->partitioned_rels, estate);
/*
* Set up empty vector of subplan states
*/
- nplans = list_length(node->mergeplans);
+ nplans = list_length(append->appendplans);
mergeplanstates = (PlanState **) palloc0(nplans * sizeof(PlanState *));
@@ -116,7 +117,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* results into the array "mergeplans".
*/
i = 0;
- foreach(lc, node->mergeplans)
+ foreach(lc, append->appendplans)
{
Plan *initNode = (Plan *) lfirst(lc);
@@ -133,17 +134,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/*
* initialize sort-key information
*/
- mergestate->ms_nkeys = node->numCols;
- mergestate->ms_sortkeys = palloc0(sizeof(SortSupportData) * node->numCols);
+ mergestate->ms_nkeys = append->numCols;
+ mergestate->ms_sortkeys = palloc0(sizeof(SortSupportData) * append->numCols);
- for (i = 0; i < node->numCols; i++)
+ for (i = 0; i < append->numCols; i++)
{
SortSupport sortKey = mergestate->ms_sortkeys + i;
sortKey->ssup_cxt = CurrentMemoryContext;
- sortKey->ssup_collation = node->collations[i];
- sortKey->ssup_nulls_first = node->nullsFirst[i];
- sortKey->ssup_attno = node->sortColIdx[i];
+ sortKey->ssup_collation = append->collations[i];
+ sortKey->ssup_nulls_first = append->nullsFirst[i];
+ sortKey->ssup_attno = append->sortColIdx[i];
/*
* It isn't feasible to perform abbreviated key conversion, since
@@ -154,7 +155,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
sortKey->abbreviate = false;
- PrepareSortSupportFromOrderingOp(node->sortOperators[i], sortKey);
+ PrepareSortSupportFromOrderingOp(append->sortOperators[i], sortKey);
}
/*
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f1bed14e2b..23dd6043be 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -224,6 +224,19 @@ _copyModifyTable(const ModifyTable *from)
return newnode;
}
+static void
+copyAppendFields(const Append *from, Append *newnode)
+{
+ CopyPlanFields((const Plan *) from, (Plan *) newnode);
+ COPY_NODE_FIELD(partitioned_rels);
+ COPY_NODE_FIELD(appendplans);
+ COPY_SCALAR_FIELD(numCols);
+ COPY_POINTER_FIELD(sortColIdx, from->numCols * sizeof(AttrNumber));
+ COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
+ COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
+ COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
+}
+
/*
* _copyAppend
*/
@@ -232,16 +245,7 @@ _copyAppend(const Append *from)
{
Append *newnode = makeNode(Append);
- /*
- * copy node superclass fields
- */
- CopyPlanFields((const Plan *) from, (Plan *) newnode);
-
- /*
- * copy remainder of node
- */
- COPY_NODE_FIELD(partitioned_rels);
- COPY_NODE_FIELD(appendplans);
+ copyAppendFields(from, newnode);
return newnode;
}
@@ -254,21 +258,7 @@ _copyMergeAppend(const MergeAppend *from)
{
MergeAppend *newnode = makeNode(MergeAppend);
- /*
- * copy node superclass fields
- */
- CopyPlanFields((const Plan *) from, (Plan *) newnode);
-
- /*
- * copy remainder of node
- */
- COPY_NODE_FIELD(partitioned_rels);
- COPY_NODE_FIELD(mergeplans);
- COPY_SCALAR_FIELD(numCols);
- COPY_POINTER_FIELD(sortColIdx, from->numCols * sizeof(AttrNumber));
- COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
- COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
- COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
+ copyAppendFields((const Append *)from, (Append *)newnode);
return newnode;
}
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index e3eb0c5788..ccbf11d72e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3735,7 +3735,7 @@ planstate_tree_walker(PlanState *planstate,
return true;
break;
case T_MergeAppend:
- if (planstate_walk_members(((MergeAppend *) plan)->mergeplans,
+ if (planstate_walk_members(((MergeAppend *) plan)->plan.appendplans,
((MergeAppendState *) planstate)->mergeplans,
walker, context))
return true;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b83d919e40..47f276d9f5 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -386,27 +386,14 @@ _outModifyTable(StringInfo str, const ModifyTable *node)
}
static void
-_outAppend(StringInfo str, const Append *node)
+_outAppendInfo(StringInfo str, const Append *node)
{
- WRITE_NODE_TYPE("APPEND");
+ int i;
_outPlanInfo(str, (const Plan *) node);
WRITE_NODE_FIELD(partitioned_rels);
WRITE_NODE_FIELD(appendplans);
-}
-
-static void
-_outMergeAppend(StringInfo str, const MergeAppend *node)
-{
- int i;
-
- WRITE_NODE_TYPE("MERGEAPPEND");
-
- _outPlanInfo(str, (const Plan *) node);
-
- WRITE_NODE_FIELD(partitioned_rels);
- WRITE_NODE_FIELD(mergeplans);
WRITE_INT_FIELD(numCols);
@@ -427,6 +414,20 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
appendStringInfo(str, " %s", booltostr(node->nullsFirst[i]));
}
+static void
+_outAppend(StringInfo str, const Append *node)
+{
+ WRITE_NODE_TYPE("APPEND");
+ _outAppendInfo(str, node);
+}
+
+static void
+_outMergeAppend(StringInfo str, const MergeAppend *node)
+{
+ WRITE_NODE_TYPE("MERGEAPPEND");
+ _outAppendInfo(str, (const Append *) &node->plan);
+}
+
static void
_outRecursiveUnion(StringInfo str, const RecursiveUnion *node)
{
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index fbf8330735..270b60041c 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1582,18 +1582,31 @@ _readModifyTable(void)
READ_DONE();
}
+static void
+ReadCommonAppend(Append* local_node)
+{
+ READ_TEMP_LOCALS();
+
+ ReadCommonPlan(&local_node->plan);
+
+ READ_NODE_FIELD(partitioned_rels);
+ READ_NODE_FIELD(appendplans);
+ READ_INT_FIELD(numCols);
+ READ_ATTRNUMBER_ARRAY(sortColIdx, local_node->numCols);
+ READ_OID_ARRAY(sortOperators, local_node->numCols);
+ READ_OID_ARRAY(collations, local_node->numCols);
+ READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
+}
+
/*
* _readAppend
*/
static Append *
_readAppend(void)
{
- READ_LOCALS(Append);
+ READ_LOCALS_NO_FIELDS(Append);
- ReadCommonPlan(&local_node->plan);
-
- READ_NODE_FIELD(partitioned_rels);
- READ_NODE_FIELD(appendplans);
+ ReadCommonAppend(local_node);
READ_DONE();
}
@@ -1604,17 +1617,9 @@ _readAppend(void)
static MergeAppend *
_readMergeAppend(void)
{
- READ_LOCALS(MergeAppend);
-
- ReadCommonPlan(&local_node->plan);
+ READ_LOCALS_NO_FIELDS(MergeAppend);
- READ_NODE_FIELD(partitioned_rels);
- READ_NODE_FIELD(mergeplans);
- READ_INT_FIELD(numCols);
- READ_ATTRNUMBER_ARRAY(sortColIdx, local_node->numCols);
- READ_OID_ARRAY(sortOperators, local_node->numCols);
- READ_OID_ARRAY(collations, local_node->numCols);
- READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
+ ReadCommonAppend(&local_node->plan);
READ_DONE();
}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a7866a99e0..c39b537366 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -20,6 +20,7 @@
#include "access/sysattr.h"
#include "access/tsmapi.h"
+#include "catalog/partition.h"
#include "catalog/pg_class.h"
#include "catalog/pg_operator.h"
#include "catalog/pg_proc.h"
@@ -94,6 +95,8 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void generate_sorted_append_paths(PlannerInfo *root,
+ RelOptInfo *parent_rel);
static void generate_mergeappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys,
@@ -1431,8 +1434,19 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
* if we have zero or one live subpath due to constraint exclusion.)
*/
if (subpaths_valid)
- add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
- partitioned_rels));
+ add_path(rel, (Path *) create_append_path(root, rel, subpaths, NULL, 0,
+ partitioned_rels, NIL));
+
+ /*
+ * If possible, build ordered append path matching the PathKeys derived
+ * from the partition key. A native order is possible if it's a ranged
+ * partitioning and it doesn't have a default partition.
+ */
+ if (rel->part_scheme &&
+ rel->part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
+ get_default_partition_oid(rte->relid) == InvalidOid
+ )
+ generate_sorted_append_paths(root, rel);
/*
* Consider an append of partial unordered, unparameterized partial paths.
@@ -1458,8 +1472,8 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
Assert(parallel_workers > 0);
/* Generate a partial append path. */
- appendpath = create_append_path(rel, partial_subpaths, NULL,
- parallel_workers, partitioned_rels);
+ appendpath = create_append_path(root, rel, partial_subpaths, NULL,
+ parallel_workers, partitioned_rels, NIL);
add_partial_path(rel, (Path *) appendpath);
}
@@ -1512,8 +1526,112 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
if (subpaths_valid)
add_path(rel, (Path *)
- create_append_path(rel, subpaths, required_outer, 0,
- partitioned_rels));
+ create_append_path(root, rel, subpaths, required_outer, 0,
+ partitioned_rels, NIL));
+ }
+}
+
+static void
+generate_sorted_append_paths(PlannerInfo *root, RelOptInfo *parent_rel)
+{
+ ListCell *lc;
+ int i;
+ List *partitions_asc = NIL;
+ List *partitions_desc = NIL;
+ RangeTblEntry * parent_rte = planner_rt_fetch(parent_rel->relid, root);
+
+ if (parent_rte->relkind != RELKIND_PARTITIONED_TABLE)
+ return;
+
+ for (i = 0; i < parent_rel->nparts; i++)
+ {
+ partitions_asc = lappend(partitions_asc, parent_rel->part_rels[i]);
+ partitions_desc = lcons(parent_rel->part_rels[i], partitions_desc);
+ }
+
+ foreach(lc, parent_rel->part_pathkeys)
+ {
+ List *pathkeys = (List *) lfirst(lc);
+ PathKey *first = (PathKey *) linitial(pathkeys);
+ List *ordered_partitions = first->pk_strategy == BTLessStrategyNumber ?
+ partitions_asc : partitions_desc;
+ List *startup_subpaths = NIL;
+ List *total_subpaths = NIL;
+ List *sequential_subpaths = NIL;
+ bool startup_neq_total = false;
+ ListCell *lc2;
+
+ if (compare_pathkeys(pathkeys, root->query_pathkeys) == PATHKEYS_DIFFERENT)
+ continue;
+
+ foreach (lc2, ordered_partitions)
+ {
+ RelOptInfo *childrel = lfirst(lc2);
+ Path *cheapest_startup,
+ *cheapest_total,
+ *sequential = NULL;
+
+ /* The partition may have been pruned */
+ if (!childrel)
+ continue;
+
+ //Assert(pathkeys_contained_in(pathkeys, root->query_pathkeys));
+
+ cheapest_startup = get_cheapest_path_for_pathkeys(childrel->pathlist,
+ root->query_pathkeys,
+ NULL,
+ STARTUP_COST,
+ false);
+ cheapest_total = get_cheapest_path_for_pathkeys(childrel->pathlist,
+ root->query_pathkeys,
+ NULL,
+ TOTAL_COST,
+ false);
+
+ /*
+ * If we can't find any paths with the right order just use the
+ * cheapest-total path; we'll have to sort it later.
+ */
+ if (cheapest_startup == NULL || cheapest_total == NULL)
+ {
+ cheapest_startup = cheapest_total =
+ childrel->cheapest_total_path;
+ /* Assert we do have an unparameterized path for this child */
+ Assert(cheapest_total->param_info == NULL);
+ }
+ /*
+ * Force a an unordered path, which could be cheaper in corner cases where
+ * orderedpaths are too expensive.
+ */
+ sequential = childrel->cheapest_total_path;
+
+ /*
+ * Notice whether we actually have different paths for the
+ * "cheapest" and "total" cases; frequently there will be no point
+ * in two create_merge_append_path() calls.
+ */
+ if (cheapest_startup != cheapest_total)
+ startup_neq_total = true;
+ startup_subpaths =
+ lappend(startup_subpaths, cheapest_startup);
+ total_subpaths =
+ lappend(total_subpaths, cheapest_total);
+ sequential_subpaths =
+ lappend(sequential_subpaths, sequential);
+
+ }
+ if (startup_subpaths)
+ {
+ add_path(parent_rel, (Path *) create_append_path(root, parent_rel, startup_subpaths,
+ NULL, 0, NIL, root->query_pathkeys));
+ }
+ if (startup_neq_total)
+ add_path(parent_rel, (Path *) create_append_path(root,
+ parent_rel, total_subpaths, NULL, 0, NIL, root->query_pathkeys));
+ if (sequential_subpaths){
+ add_path(parent_rel, (Path *) create_append_path(root,
+ parent_rel, sequential_subpaths, NULL, 0, NIL, root->query_pathkeys));
+ }
}
}
@@ -1749,7 +1867,7 @@ set_dummy_rel_pathlist(RelOptInfo *rel)
rel->pathlist = NIL;
rel->partial_pathlist = NIL;
- add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+ add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NULL, 0, NIL, NIL));
/*
* We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 6ee23509c5..9b201ecdda 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1217,7 +1217,7 @@ mark_dummy_rel(RelOptInfo *rel)
rel->partial_pathlist = NIL;
/* Set up the dummy path */
- add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
+ add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NULL, 0, NIL, NIL));
/* Set or update cheapest_total_path and related fields */
set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 28216629aa..e063128251 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -78,8 +78,8 @@ static List *get_gating_quals(PlannerInfo *root, List *quals);
static Plan *create_gating_plan(PlannerInfo *root, Path *path, Plan *plan,
List *gating_quals);
static Plan *create_join_plan(PlannerInfo *root, JoinPath *best_path);
-static Plan *create_append_plan(PlannerInfo *root, AppendPath *best_path);
-static Plan *create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path);
+static Plan *create_append_plan(NodeTag node_type, PlannerInfo *root, AppendPath *best_path);
+static Plan *wrap_sort(PlannerInfo *root, Append* parentplan, Path *subpath, List* pathkeys, double limit_tuples);
static Result *create_result_plan(PlannerInfo *root, ResultPath *best_path);
static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
@@ -203,7 +203,7 @@ static NamedTuplestoreScan *make_namedtuplestorescan(List *qptlist, List *qpqual
Index scanrelid, char *enrname);
static WorkTableScan *make_worktablescan(List *qptlist, List *qpqual,
Index scanrelid, int wtParam);
-static Append *make_append(List *appendplans, List *tlist, List *partitioned_rels);
+static Append *make_append(NodeTag node_type, List *tlist, List *partitioned_rels);
static RecursiveUnion *make_recursive_union(List *tlist,
Plan *lefttree,
Plan *righttree,
@@ -381,12 +381,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
(JoinPath *) best_path);
break;
case T_Append:
- plan = create_append_plan(root,
- (AppendPath *) best_path);
- break;
case T_MergeAppend:
- plan = create_merge_append_plan(root,
- (MergeAppendPath *) best_path);
+ plan = create_append_plan(best_path->pathtype, root,
+ (AppendPath *) best_path);
break;
case T_Result:
if (IsA(best_path, ProjectionPath))
@@ -999,13 +996,17 @@ create_join_plan(PlannerInfo *root, JoinPath *best_path)
* Returns a Plan node.
*/
static Plan *
-create_append_plan(PlannerInfo *root, AppendPath *best_path)
+create_append_plan(NodeTag node_type, PlannerInfo *root, AppendPath *best_path)
{
- Append *plan;
+ Append *node;
+ Plan *plan;
List *tlist = build_path_tlist(root, &best_path->path);
+ List *pathkeys = best_path->path.pathkeys;
List *subplans = NIL;
ListCell *subpaths;
+ double limit_tuples = best_path->limit_tuples;
+
/*
* The subpaths list could be empty, if every child was proven empty by
* constraint exclusion. In that case generate a dummy plan that returns
@@ -1017,9 +1018,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
*/
if (best_path->subpaths == NIL)
{
- /* Generate a Result plan with constant-FALSE gating qual */
- Plan *plan;
-
plan = (Plan *) make_result(tlist,
(Node *) list_make1(makeBoolConst(false,
false)),
@@ -1030,62 +1028,11 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
return plan;
}
- /* Build the plan for each child */
- foreach(subpaths, best_path->subpaths)
- {
- Path *subpath = (Path *) lfirst(subpaths);
- Plan *subplan;
-
- /* Must insist that all children return the same tlist */
- subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
-
- subplans = lappend(subplans, subplan);
- }
-
- /*
- * XXX ideally, if there's just one child, we'd not bother to generate an
- * Append node but just return the single child. At the moment this does
- * not work because the varno of the child scan plan won't match the
- * parent-rel Vars it'll be asked to emit.
- */
-
- plan = make_append(subplans, tlist, best_path->partitioned_rels);
-
- copy_generic_path_info(&plan->plan, (Path *) best_path);
-
- return (Plan *) plan;
-}
-
-/*
- * create_merge_append_plan
- * Create a MergeAppend plan for 'best_path' and (recursively) plans
- * for its subpaths.
- *
- * Returns a Plan node.
- */
-static Plan *
-create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
-{
- MergeAppend *node = makeNode(MergeAppend);
- Plan *plan = &node->plan;
- List *tlist = build_path_tlist(root, &best_path->path);
- List *pathkeys = best_path->path.pathkeys;
- List *subplans = NIL;
- ListCell *subpaths;
-
- /*
- * We don't have the actual creation of the MergeAppend node split out
- * into a separate make_xxx function. This is because we want to run
- * prepare_sort_from_pathkeys on it before we do so on the individual
- * child plans, to make cross-checking the sort info easier.
- */
+ node = make_append(node_type, tlist, best_path->partitioned_rels);
+ plan = &node->plan;
copy_generic_path_info(plan, (Path *) best_path);
plan->targetlist = tlist;
- plan->qual = NIL;
- plan->lefttree = NULL;
- plan->righttree = NULL;
- /* Compute sort column info, and adjust MergeAppend's tlist as needed */
(void) prepare_sort_from_pathkeys(plan, pathkeys,
best_path->path.parent->relids,
NULL,
@@ -1096,70 +1043,18 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path)
&node->collations,
&node->nullsFirst);
- /*
- * Now prepare the child plans. We must apply prepare_sort_from_pathkeys
- * even to subplans that don't need an explicit sort, to make sure they
- * are returning the same sort key columns the MergeAppend expects.
- */
- foreach(subpaths, best_path->subpaths)
+ /* Build the plan for each child */
+ foreach(subpaths, best_path->subpaths)
{
Path *subpath = (Path *) lfirst(subpaths);
- Plan *subplan;
- int numsortkeys;
- AttrNumber *sortColIdx;
- Oid *sortOperators;
- Oid *collations;
- bool *nullsFirst;
-
- /* Build the child plan */
- /* Must insist that all children return the same tlist */
- subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
-
- /* Compute sort column info, and adjust subplan's tlist as needed */
- subplan = prepare_sort_from_pathkeys(subplan, pathkeys,
- subpath->parent->relids,
- node->sortColIdx,
- false,
- &numsortkeys,
- &sortColIdx,
- &sortOperators,
- &collations,
- &nullsFirst);
-
- /*
- * Check that we got the same sort key information. We just Assert
- * that the sortops match, since those depend only on the pathkeys;
- * but it seems like a good idea to check the sort column numbers
- * explicitly, to ensure the tlists really do match up.
- */
- Assert(numsortkeys == node->numCols);
- if (memcmp(sortColIdx, node->sortColIdx,
- numsortkeys * sizeof(AttrNumber)) != 0)
- elog(ERROR, "MergeAppend child's targetlist doesn't match MergeAppend");
- Assert(memcmp(sortOperators, node->sortOperators,
- numsortkeys * sizeof(Oid)) == 0);
- Assert(memcmp(collations, node->collations,
- numsortkeys * sizeof(Oid)) == 0);
- Assert(memcmp(nullsFirst, node->nullsFirst,
- numsortkeys * sizeof(bool)) == 0);
-
- /* Now, insert a Sort node if subplan isn't sufficiently ordered */
- if (!pathkeys_contained_in(pathkeys, subpath->pathkeys))
- {
- Sort *sort = make_sort(subplan, numsortkeys,
- sortColIdx, sortOperators,
- collations, nullsFirst);
-
- label_sort_with_costsize(root, sort, best_path->limit_tuples);
- subplan = (Plan *) sort;
- }
+ /* TODO: decrease limit tuples for each subpath */
+ Plan *subplan = plan = wrap_sort(root, node, subpath, pathkeys, limit_tuples);
+ if(limit_tuples > 0)
+ limit_tuples = Max(1, limit_tuples - subpath->rows);
subplans = lappend(subplans, subplan);
}
-
- node->partitioned_rels = best_path->partitioned_rels;
- node->mergeplans = subplans;
-
+ node->appendplans = subplans;
return (Plan *) node;
}
@@ -1566,7 +1461,7 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
* anyway. Usually create_projection_path will have detected that and set
* dummypp if we don't need a Result; but its decision can't be final,
* because some createplan.c routines change the tlists of their nodes.
- * (An example is that create_merge_append_plan might add resjunk sort
+ * (An example is that create_append_plan might add resjunk sort
* columns to a MergeAppend.) So we have to recheck here. If we do
* arrive at a different answer than create_projection_path did, we'll
* have made slightly wrong cost estimates; but label the plan with the
@@ -5274,21 +5169,100 @@ make_foreignscan(List *qptlist,
}
static Append *
-make_append(List *appendplans, List *tlist, List *partitioned_rels)
+make_append(NodeTag node_type, List *tlist, List *partitioned_rels)
{
- Append *node = makeNode(Append);
- Plan *plan = &node->plan;
+ Append *node;
+ Plan *plan;
- plan->targetlist = tlist;
+ if (node_type != T_Append && node_type != T_MergeAppend)
+ elog(ERROR, "create_append_plan can only create Append or MergeAppend plans");
+
+ switch(node_type)
+ {
+ case T_Append:
+ node = makeNode(Append);
+ break;
+ case T_MergeAppend:
+ node = (Append *) makeNode(MergeAppend);
+ break;
+ default:
+ elog(ERROR, "create_append_plan can only create Append or MergeAppend plans");
+ }
+
+ plan = &node->plan; plan->targetlist = tlist;
plan->qual = NIL;
plan->lefttree = NULL;
plan->righttree = NULL;
node->partitioned_rels = partitioned_rels;
- node->appendplans = appendplans;
+ node->appendplans = NIL;
+ node->numCols = 0;
+ node->sortColIdx = NULL;
+ node->sortOperators = NULL;
+ node->collations = NULL;
+ node->nullsFirst = NULL;
return node;
}
+static Plan *
+wrap_sort(PlannerInfo *root, Append* parentplan, Path *subpath, List* pathkeys, double limit_tuples)
+{
+ int numCols;
+ AttrNumber *sortColIdx;
+ Oid *sortOperators;
+ Oid *collations;
+ bool *nullsFirst;
+ Plan *subplan;
+
+ /* Build the child plan */
+ /* Must insist that all children return the same tlist */
+ subplan = create_plan_recurse(root, subpath, CP_EXACT_TLIST);
+
+ if (pathkeys != NIL)
+ {
+ /* Compute sort column info, and adjust subplan's tlist as needed */
+ subplan = prepare_sort_from_pathkeys(subplan, pathkeys,
+ subpath->parent->relids,
+ parentplan->sortColIdx,
+ false,
+ &numCols,
+ &sortColIdx,
+ &sortOperators,
+ &collations,
+ &nullsFirst);
+
+ /*
+ * Check that we got the same sort key information. We just Assert
+ * that the sortops match, since those depend only on the pathkeys;
+ * but it seems like a good idea to check the sort column numbers
+ * explicitly, to ensure the tlists really do match up.
+ */
+ Assert(numCols == parentplan->numCols);
+ if (memcmp(sortColIdx, parentplan->sortColIdx,
+ numCols * sizeof(AttrNumber)) != 0)
+ elog(ERROR, "MergeAppend child's targetlist doesn't match MergeAppend");
+ Assert(memcmp(sortOperators, parentplan->sortOperators,
+ numCols * sizeof(Oid)) == 0);
+ Assert(memcmp(collations, parentplan->collations,
+ numCols * sizeof(Oid)) == 0);
+ Assert(memcmp(nullsFirst, parentplan->nullsFirst,
+ numCols * sizeof(bool)) == 0);
+
+ /* Now, insert a Sort node if subplan isn't sufficiently ordered */
+ if (!pathkeys_contained_in(pathkeys, subpath->pathkeys))
+ {
+ Sort *sort = make_sort(subplan, numCols,
+ sortColIdx, sortOperators,
+ collations, nullsFirst);
+
+ label_sort_with_costsize(root, sort, limit_tuples);
+ subplan = (Plan *) sort;
+ }
+ }
+
+ return subplan;
+}
+
static RecursiveUnion *
make_recursive_union(List *tlist,
Plan *lefttree,
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index f4e0a6ea3d..acf87cc917 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -25,6 +25,7 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
+#include "optimizer/plancat.h"
#include "optimizer/planmain.h"
@@ -176,6 +177,13 @@ query_planner(PlannerInfo *root, List *tlist,
*/
(*qp_callback) (root, qp_extra);
+ /*
+ * We consider generating pathkeys for partitioned tables only if the
+ * query has some ordering
+ */
+ if (root->query_pathkeys != NIL)
+ generate_pathkeys_for_partitioned_tables(root);
+
/*
* Examine any "placeholder" expressions generated during subquery pullup.
* Make sure that the Vars they need are marked as needed at the relevant
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7f146d670c..175088313b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3643,10 +3643,12 @@ create_grouping_paths(PlannerInfo *root,
paths = lappend(paths, path);
}
path = (Path *)
- create_append_path(grouped_rel,
+ create_append_path(root,
+ grouped_rel,
paths,
NULL,
0,
+ NIL,
NIL);
path->pathtarget = target;
}
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b0c9e94459..c9280af5c1 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -932,12 +932,12 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
* targetlist or check quals.
*/
set_dummy_tlist_references(plan, rtoffset);
- Assert(splan->plan.qual == NIL);
- foreach(l, splan->partitioned_rels)
+ Assert(splan->plan.plan.qual == NIL);
+ foreach(l, splan->plan.partitioned_rels)
{
lfirst_int(l) += rtoffset;
}
- foreach(l, splan->mergeplans)
+ foreach(l, splan->plan.appendplans)
{
lfirst(l) = set_plan_refs(root,
(Plan *) lfirst(l),
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 1103984779..19fc190f7d 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2589,7 +2589,7 @@ finalize_plan(PlannerInfo *root, Plan *plan,
{
ListCell *l;
- foreach(l, ((MergeAppend *) plan)->mergeplans)
+ foreach(l, ((MergeAppend *) plan)->plan.appendplans)
{
context.paramids =
bms_add_members(context.paramids,
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 3e0c3de86d..fc615a4314 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -590,7 +590,7 @@ generate_union_path(SetOperationStmt *op, PlannerInfo *root,
/*
* Append the child results together.
*/
- path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+ path = (Path *) create_append_path(root, result_rel, pathlist, NULL, 0, NIL, NIL);
/* We have to manually jam the right tlist into the path; ick */
path->pathtarget = create_pathtarget(root, tlist);
@@ -702,7 +702,7 @@ generate_nonunion_path(SetOperationStmt *op, PlannerInfo *root,
/*
* Append the child results together.
*/
- path = (Path *) create_append_path(result_rel, pathlist, NULL, 0, NIL);
+ path = (Path *) create_append_path(root, result_rel, pathlist, NULL, 0, NIL, NIL);
/* We have to manually jam the right tlist into the path; ick */
path->pathtarget = create_pathtarget(root, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 26567cb7f6..899769eca4 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1200,11 +1200,21 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals,
* Note that we must handle subpaths = NIL, representing a dummy access path.
*/
AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
- int parallel_workers, List *partitioned_rels)
+create_append_path(PlannerInfo *root, RelOptInfo *rel, List *subpaths,
+ Relids required_outer, int parallel_workers,
+ List *partitioned_rels, List *pathkeys)
{
AppendPath *pathnode = makeNode(AppendPath);
ListCell *l;
+ double limit_tuples;
+ Cost input_startup_cost;
+ Cost input_total_cost;
+
+ /*
+ * Just to make sure that nobody is trying to build a parallel sorted
+ * append path
+ */
+ Assert(parallel_workers > 0 ? pathkeys == NIL : true);
pathnode->path.pathtype = T_Append;
pathnode->path.parent = rel;
@@ -1214,7 +1224,7 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel;
pathnode->path.parallel_workers = parallel_workers;
- pathnode->path.pathkeys = NIL; /* result is always considered unsorted */
+ pathnode->path.pathkeys = pathkeys;
pathnode->partitioned_rels = list_copy(partitioned_rels);
pathnode->subpaths = subpaths;
@@ -1229,15 +1239,49 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
pathnode->path.rows = 0;
pathnode->path.startup_cost = 0;
pathnode->path.total_cost = 0;
+
+ if (root && bms_equal(rel->relids, root->all_baserels))
+ pathnode->limit_tuples = root->limit_tuples;
+ else
+ pathnode->limit_tuples = -1.0;
+
+ limit_tuples = pathnode->limit_tuples;
+
foreach(l, subpaths)
{
Path *subpath = (Path *) lfirst(l);
+ input_startup_cost = subpath->startup_cost;
+ input_total_cost = subpath->total_cost;
+
+ if(pathkeys && !pathkeys_contained_in(pathkeys, subpath->pathkeys))
+ {
+ Path sort_path; /* dummy for result of cost_sort */
+
+ cost_sort(&sort_path,
+ root,
+ pathkeys,
+ subpath->total_cost,
+ subpath->rows,
+ subpath->pathtarget->width,
+ 0.0,
+ work_mem,
+ limit_tuples);
+ input_startup_cost += sort_path.startup_cost;
+ input_total_cost = sort_path.total_cost;
+ }
pathnode->path.rows += subpath->rows;
if (l == list_head(subpaths)) /* first node? */
- pathnode->path.startup_cost = subpath->startup_cost;
- pathnode->path.total_cost += subpath->total_cost;
+ pathnode->path.startup_cost = input_startup_cost;
+ /*
+ * Consider that the number of tuples to be fetched decreases
+ * for every subsequent partition
+ */
+ if(limit_tuples > 0)
+ limit_tuples = Max(1, limit_tuples - subpath->rows);
+
+ pathnode->path.total_cost += input_total_cost;
pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index cac46bedf9..9bd8fb89f4 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -35,6 +35,7 @@
#include "nodes/makefuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
+#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/predtest.h"
#include "optimizer/prep.h"
@@ -1269,6 +1270,142 @@ get_relation_constraints(PlannerInfo *root,
return result;
}
+/*
+ * Generate pathkeys for Range-based partitions
+ */
+void
+generate_pathkeys_for_partitioned_tables(PlannerInfo *root)
+{
+ int i;
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[i];
+
+ /* Only base relation can be partitionned */
+ if (rel && has_useful_pathkeys(root, rel))
+ {
+ Index varno = rel->relid;
+ Index parent_varno = varno;
+ RelOptInfo *parent_rel = rel;
+ Relation relation;
+ RangeTblEntry *rte = planner_rt_fetch(varno, root);
+
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ continue;
+
+ while (parent_rel->reloptkind != RELOPT_BASEREL)
+ {
+ ListCell *lc;
+
+ foreach(lc, root->append_rel_list)
+ {
+ AppendRelInfo *ari = lfirst(lc);
+
+ if (ari->child_relid == parent_rel->relid)
+ {
+ parent_rel = root->simple_rel_array[ari->parent_relid];
+ break;
+ }
+
+ /* Should never happen: we should always be able to climb up the
+ * inheritance tree */
+ if(!lc)
+ elog(ERROR, "Unable to find parent table for child ");
+ }
+ }
+ parent_varno = parent_rel->relid;
+
+ /*
+ * Ignore base rel if it's not a partitionned table. We'll still
+ * have to open it to verify if it's a range partitionned table
+ */
+ if (rte->relkind != RELKIND_PARTITIONED_TABLE)
+ continue;
+
+ relation = heap_open(rte->relid, NoLock);
+
+ /*
+ * If the partitioning has natural data ordering (ie. a range
+ * strategy and no default partition), build pathkeys
+ * for the partitioning keys
+ */
+ if(relation->rd_partkey->strategy == PARTITION_STRATEGY_RANGE &&
+ get_default_oid_from_partdesc(relation->rd_partdesc) == InvalidOid)
+ {
+ int j;
+ ListCell *lc;
+ Oid equality_op;
+ EquivalenceClass *ec;
+ List *opfamilies;
+ List *asc_pathkeys = NIL;
+ List *desc_pathkeys = NIL;
+
+ Assert(rel->part_pathkeys == NIL);
+
+ /* Lookup individual vars from the pathtarget */
+ /* FIXME: refactor this in an external function */
+ lc = list_head(relation->rd_partkey->partexprs);
+ for (j=0; j < relation->rd_partkey->partnatts; j++)
+ {
+ AttrNumber attno = relation->rd_partkey->partattrs[j];
+ Expr *expr = NULL;
+
+ /* This is not an attribute, but an expression */
+ if(attno == InvalidAttrNumber)
+ {
+ /* Should never append : we should be able to fetch
+ * an expression for anything in the partition key */
+ if (!lc)
+ elog(ERROR, "Could not find expression for partition key");
+ expr = lfirst(lc);
+ lc = lnext(lc);
+ }
+ else
+ {
+ expr = (Expr*) makeVar(parent_varno, attno, relation->rd_partkey->parttypid[j],
+ relation->rd_partkey->parttypmod[j],
+ relation->rd_partkey->parttypcoll[j],
+ 0);
+ }
+
+ equality_op = get_opfamily_member(relation->rd_partkey->partopfamily[j],
+ relation->rd_partkey->partopcintype[j],
+ relation->rd_partkey->partopcintype[j],
+ BTEqualStrategyNumber);
+ opfamilies = get_mergejoin_opfamilies(equality_op);
+ ec = get_eclass_for_sort_expr(root, expr,
+ NULL, opfamilies,
+ relation->rd_partkey->partopcintype[j],
+ relation->rd_partkey->partcollation[j],
+ 0, rel->relids, true);
+ asc_pathkeys = lappend(asc_pathkeys, make_canonical_pathkey(root,
+ ec,
+ relation->rd_partkey->partopfamily[j],
+ BTLessStrategyNumber, false));
+ desc_pathkeys = lappend(desc_pathkeys, make_canonical_pathkey(root,
+ ec,
+ relation->rd_partkey->partopfamily[j],
+ BTGreaterStrategyNumber, true));
+ }
+
+ /* FIXME: this is as dirty as it gets */
+ if(list_length(asc_pathkeys) > list_length(root->query_pathkeys))
+ {
+ asc_pathkeys = truncate_useless_pathkeys(root, rel, asc_pathkeys);
+ desc_pathkeys = truncate_useless_pathkeys(root, rel, desc_pathkeys);
+ }
+
+ if(asc_pathkeys)
+ rel->part_pathkeys = lappend(rel->part_pathkeys, asc_pathkeys);
+
+ if(desc_pathkeys)
+ rel->part_pathkeys = lappend(rel->part_pathkeys, desc_pathkeys);
+ }
+ heap_close(relation, NoLock);
+ }
+ }
+}
+
/*
* get_relation_statistics
* Retrieve extended statistics defined on the table.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 077e89ae43..86ecf70cc2 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -151,6 +151,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->boundinfo = NULL;
rel->part_rels = NULL;
rel->partexprs = NULL;
+ rel->part_pathkeys = NIL;
/*
* Pass top parent's relids down the inheritance hierarchy. If the parent
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a382331f41..283af54e09 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -248,6 +248,17 @@ typedef struct Append
/* RT indexes of non-leaf tables in a partition tree */
List *partitioned_rels;
List *appendplans;
+ /* remaining fields are just like the sort-key info in struct Sort */
+ /* FIXME: We should either
+ * - define Append as sort + appendplans
+ * - define Append "like" sort and define MergeAppend as Append, only with
+ * a different tag
+ */
+ int numCols; /* number of sort-key columns */
+ AttrNumber *sortColIdx; /* their indexes in the target list */
+ Oid *sortOperators; /* OIDs of operators to sort them by */
+ Oid *collations; /* OIDs of collations */
+ bool *nullsFirst; /* NULLS FIRST/LAST directions */
} Append;
/* ----------------
@@ -257,16 +268,7 @@ typedef struct Append
*/
typedef struct MergeAppend
{
- Plan plan;
- /* RT indexes of non-leaf tables in a partition tree */
- List *partitioned_rels;
- List *mergeplans;
- /* remaining fields are just like the sort-key info in struct Sort */
- int numCols; /* number of sort-key columns */
- AttrNumber *sortColIdx; /* their indexes in the target list */
- Oid *sortOperators; /* OIDs of operators to sort them by */
- Oid *collations; /* OIDs of collations */
- bool *nullsFirst; /* NULLS FIRST/LAST directions */
+ Append plan;
} MergeAppend;
/* ----------------
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 48e6012f7f..81c962d91b 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -646,6 +646,8 @@ typedef struct RelOptInfo
struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions,
* stored in the same order of bounds */
List **partexprs; /* Partition key expressions. */
+ List *part_pathkeys; /* Pathkey representing the partition key if a
+ natural order exists*/
} RelOptInfo;
/*
@@ -1232,6 +1234,7 @@ typedef struct AppendPath
/* RT indexes of non-leaf tables in a partition tree */
List *partitioned_rels;
List *subpaths; /* list of component Paths */
+ double limit_tuples; /* hard limit on output tuples, or -1 */
} AppendPath;
#define IS_DUMMY_PATH(p) \
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e372f8862b..1496bdc0dc 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -63,9 +63,13 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
List *bitmapquals);
extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
List *tidquals, Relids required_outer);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths,
- Relids required_outer, int parallel_workers,
- List *partitioned_rels);
+extern AppendPath *create_append_path(PlannerInfo *root,
+ RelOptInfo *rel,
+ List *subpaths,
+ Relids required_outer,
+ int parallel_workers,
+ List *partitioned_rels,
+ List *pathkeys);
extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
RelOptInfo *rel,
List *subpaths,
diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h
index 71f0faf938..6d96d54b5d 100644
--- a/src/include/optimizer/plancat.h
+++ b/src/include/optimizer/plancat.h
@@ -35,6 +35,8 @@ extern void estimate_rel_size(Relation rel, int32 *attr_widths,
extern int32 get_relation_data_width(Oid relid, int32 *attr_widths);
+extern void generate_pathkeys_for_partitioned_tables(PlannerInfo *root);
+
extern bool relation_excluded_by_constraints(PlannerInfo *root,
RelOptInfo *rel, RangeTblEntry *rte);
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
index c698faff2f..03d9cbb242 100644
--- a/src/test/regress/expected/inherit.out
+++ b/src/test/regress/expected/inherit.out
@@ -1952,13 +1952,13 @@ explain (costs off) select min(a), max(a) from parted_minmax where b = '12345';
Result
InitPlan 1 (returns $0)
-> Limit
- -> Merge Append
+ -> Append
Sort Key: parted_minmax1.a
-> Index Only Scan using parted_minmax1i on parted_minmax1
Index Cond: ((a IS NOT NULL) AND (b = '12345'::text))
InitPlan 2 (returns $1)
-> Limit
- -> Merge Append
+ -> Append
Sort Key: parted_minmax1_1.a DESC
-> Index Only Scan Backward using parted_minmax1i on parted_minmax1 parted_minmax1_1
Index Cond: ((a IS NOT NULL) AND (b = '12345'::text))
On Fri, Sep 22, 2017 at 3:34 PM, Julien Rouhaud <rjuju123@gmail.com> wrote:
PFA v3 of the patch, once again rebased and that now use all the newly
available infrastructure.I also added a check that a sorted append can't be generated when a
default partition has been declared.
I just took a quick look at this and was kind of surprised to see that
it didn't look much like what I would expect.
What I would expect is:
1. The PartitionDescData grows a new flag indicating whether
partitions can be regarded as being in bound order. This is true for
range partitions without a default partition, list partitions without
a default partition and without interleaved bounds, and maybe other
cases we want to worry about later. We set this flag correctly when
we build the PartitionDesc and propagate it into the RelOptInfo for
the partitioned table when we set up its other partitioning details.
2. generate_mergeappend_paths() gets renamed to
generate_sorted_append_paths(). At the top of the function, it checks
whether the above flag is set; if not, give up on this optimization.
Otherwise, figure out what the set of pathkeys that correspond to the
partition key would look like. Call these the
partition_order_pathkeys.
3. For each set of possible pathkeys, it checks whether the proposed
pathkeys are equal to (or an initial prefix of) the
partition_order_pathkeys.
4. If so, instead of calling create_merge_append_path(), it calls
create_append_path().
5. create_append_path() gets the proposed pathkeys via a new List
*pathkeys argument and sticks them on the path.
And that's it. The extensive patch does a lot of other stuff, like
whacking around the structure of AppendPath vs. MergeAppendPath, and
I'm not sure why we need or want any of that. I might be missing
something, though.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 22, 2017 at 9:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 22, 2017 at 3:34 PM, Julien Rouhaud <rjuju123@gmail.com> wrote:
PFA v3 of the patch, once again rebased and that now use all the newly
available infrastructure.I also added a check that a sorted append can't be generated when a
default partition has been declared.I just took a quick look at this and was kind of surprised to see that
it didn't look much like what I would expect.What I would expect is:
[...]
Thanks a lot for the pointers, that's really helpful!
The extensive patch does a lot of other stuff, like
whacking around the structure of AppendPath vs. MergeAppendPath, and
I'm not sure why we need or want any of that. I might be missing
something, though.
That was one of the first question we had with the initial POC. The
reason is that this "sorted append" is not using a merge algorithm but
just appending partitions in the right order, so it looked like we
could either create a new SortedAppend node, or use current Append
node and allow it to return sorted data. We chose the 2nd solution,
and ended up with a lot of duplicated code (all the code for the
ordering), so we tried to have Append and MergeAppend share this code.
I'm not at all satisfied with current shape, but I'm still not sure on
what node to use for this. Do you have any idea on this?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 22, 2017 at 4:23 PM, Julien Rouhaud <rjuju123@gmail.com> wrote:
That was one of the first question we had with the initial POC. The
reason is that this "sorted append" is not using a merge algorithm but
just appending partitions in the right order, so it looked like we
could either create a new SortedAppend node, or use current Append
node and allow it to return sorted data. We chose the 2nd solution,
and ended up with a lot of duplicated code (all the code for the
ordering), so we tried to have Append and MergeAppend share this code.
I'm not at all satisfied with current shape, but I'm still not sure on
what node to use for this. Do you have any idea on this?
During planning, *every* node has a list of pathkeys associated with
it which represent the presumed output ordering. You can support
ordered append paths without changing any data structures at all; it's
just a matter of putting pathkeys into an AppendPath.
The reason why MergeAppend has extra stuff in the node (numCols,
sortColIdx, etc.) is so that it can actually perform the sort at
runtime. But this feature requires no runtime support -- that's kinda
the point -- so there's no need for it to carry that information, or
to add any new nodes.
At least, IIUC.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 22, 2017 at 11:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
During planning, *every* node has a list of pathkeys associated with
it which represent the presumed output ordering. You can support
ordered append paths without changing any data structures at all; it's
just a matter of putting pathkeys into an AppendPath.
I totally agree.
The reason why MergeAppend has extra stuff in the node (numCols,
sortColIdx, etc.) is so that it can actually perform the sort at
runtime. But this feature requires no runtime support -- that's kinda
the point -- so there's no need for it to carry that information, or
to add any new nodes.At least, IIUC.
That's true, but numCols, sortColdIdx etc are also used to display the
sort key in an explain. If an append can return sorted data, it
should also display the sort information, so I think these fields are
still required in an Append node.
If it's ok to only add these fields in an Append node for the explain
part, I'll revert the Append/MergeAppend refactoring and do some other
cleanup as you advised earlier.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Sep 23, 2017 at 6:29 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
That's true, but numCols, sortColdIdx etc are also used to display the
sort key in an explain. If an append can return sorted data, it
should also display the sort information, so I think these fields are
still required in an Append node.
I don't think so. An index scan doesn't output that information, nor
does a nested loop which inherits a sort order from its outer path. I
think the rule is that a plan node which takes steps to get the data
into a certain order might want to output something about that, but a
plan node which somehow gets that ordering without taking any explicit
action doesn't print anything.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 26, 2017 at 2:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Sep 23, 2017 at 6:29 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
That's true, but numCols, sortColdIdx etc are also used to display the
sort key in an explain. If an append can return sorted data, it
should also display the sort information, so I think these fields are
still required in an Append node.I don't think so. An index scan doesn't output that information, nor
does a nested loop which inherits a sort order from its outer path. I
think the rule is that a plan node which takes steps to get the data
into a certain order might want to output something about that, but a
plan node which somehow gets that ordering without taking any explicit
action doesn't print anything.
Oh, ok that indeed makes sense. As I said I'll remove all the useless
noise and send an updated patch. Thanks again.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 27, 2017 at 2:59 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Sep 26, 2017 at 2:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Sep 23, 2017 at 6:29 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
That's true, but numCols, sortColdIdx etc are also used to display the
sort key in an explain. If an append can return sorted data, it
should also display the sort information, so I think these fields are
still required in an Append node.I don't think so. An index scan doesn't output that information, nor
does a nested loop which inherits a sort order from its outer path. I
think the rule is that a plan node which takes steps to get the data
into a certain order might want to output something about that, but a
plan node which somehow gets that ordering without taking any explicit
action doesn't print anything.Oh, ok that indeed makes sense. As I said I'll remove all the useless
noise and send an updated patch. Thanks again.
Waiting for a patch update for two months now, so marked as returned
with feedback.
--
Michael